Chapter 4: Property-Based Testing#

In Chapters 1–3, we moved from a quick, monolithic solver to a modular design with unit tests. Unit tests gave us confidence, but only for cases that we handpicked.

Now, we’ll take the next step by introducing property-based testing via the Hypothesis library:

  • Instead of manually writing individual test cases, we’ll state general properties that must be satisfied by the solution such as conservation, symmetry, maximum principle intuition, or linearity.

  • We will also specify a function that generates valid test inputs. For each input, we assert that these properties hold.

  • Hypothesis will generate many random inputs automatically, exploring cases we might never think to test.

  • When a property fails, Hypothesis will shrink the failing case to the simplest possible counterexample, the testing equivalent of a controlled scientific experiment.

What is a “property”?#

A property is a statement about your program that you expect to hold for all valid inputs. It’s a general rule, not just a single example.

For our 1-D heat equation solver, some key properties might include:

  • Conservation: total heat stays constant.

  • Telescoping property: The sum of the divergence over all cells equals the net flux through the boundaries.

  • Symmetry: symmetric initial states stay symmetric under symmetric BCs.

  • Monotonicity: non-decreasing initial states remain non-decreasing.

Unit tests vs. property-based tests:#

  • Unit tests: “for this input, expect this output”

  • Property tests: “for all inputs satisfying these preconditions, this relation should hold”

Property tests complement unit tests:

  • Unit tests are concrete and focused.

  • Property tests are broader and can reveal edge cases you never anticipated.

How this relates to Chapter 2#

In Chapter 2, we introduced preconditions, postconditions, and invariants as lightweight specifications for our solver.

Property-based testing is a natural extension of these ideas:

  • The precondition {P} now defines the input space that Hypothesis will explore.

  • The postcondition {Q} or invariant becomes the property being checked.

  • When a test fails, Hypothesis provides a counterexample, helping you determine whether:

    • The precondition was too weak (i.e., it allowed invalid inputs).

    • The postcondition was too strong (i.e., it ruled out valid outputs).

    • There’s a bug in the implementation.

Think of this as turning your Chapter 2 contracts into scientific hypotheses:

  • “If the precondition holds, the property should always be true.”

Hypothesis plays the role of the experimenter, probing your code with many randomized “experiments” to try and refute your claim.

The Hypothesis library#

Hypothesis is a Python library for property-based testing. Instead of handpicking test cases, you describe the space of valid inputs, and it generates random examples within that space.

If a failure is found, Hypothesis automatically shrinks the input to the simplest version that still causes the failure. This helps you debug by giving a clear, minimal failing example.

Let’s revisit our div function from the previous chapters.

# %xmode Minimal  

def div(x, y):
    assert y != 0           # P    (precondition)
    res = x / y             # code (implementation)
    assert res * y == x     # Q    (postcondition)
    return res

In Chapter 3, we tested this manually:

def test_division():
    div(7, 25)

test_division()
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[2], line 4
      1 def test_division():
      2     div(7, 25)
----> 4 test_division()

Cell In[2], line 2, in test_division()
      1 def test_division():
----> 2     div(7, 25)

Cell In[1], line 6, in div(x, y)
      4 assert y != 0           # P    (precondition)
      5 res = x / y             # code (implementation)
----> 6 assert res * y == x     # Q    (postcondition)
      7 return res

AssertionError: 

Running this fails because 7 / 25 cannot be represented exactly in floating point. But notice: we only found this case because we happened to test exactly this input. If we had tested other pairs like the ones listed below, the test might have passed, which would leave us unaware of the issue.

def test_more_divisions():
    div(7, 26)
    div(7, 24)
    div(7, 23)
    div(6, 1)
    div(2, 4)
    div(0, 1)
    div(46, 7)
    div(7657, 26)
    div(1, 3)
    div (23, 23424)
    div(1000, 3)
    div(123,456)
    print("No assertion violations encountered!")

test_more_divisions()
No assertion violations encountered!

Hypothesis to the rescue#

Let’s use Hypothesis to generate a wide range of test cases for our division function.

from hypothesis import assume, given
from hypothesis import strategies as st


# Read this as:
# given
@given(
    # (a) one randomly drawn integer
    st.integers(),
    # (b) another randomly drawn integer
    st.integers(),
)
def test_div_property(x, y):
    # test this property
    # (c) reject y == 0
    assume(y != 0)
    # test that division succeeds
    div(x, y)

The above test definition highlights the key components of property-based testing with Hypothesis:

  1. The @given decorator

    • This is the main entry point for defining a property-based test.

    • It takes zero or more strategies as arguments, which describe how Hypothesis should generate input data for the test.

  2. Strategies

    • Use strategies to define the space of possible inputs.

    • In our example, we used st.integers() to generate random integers for both x and y.

    • Hypothesis comes with a number of built-in strategies. These strategies effectively define how the input space is explored, and how a failing example is shrunk to a “simpler” case. Usually you’ll compose a number of built-in strategies to generate input data that is appropriate for your problem.

    • Here is an example from Zarr of generating input data that matches a written specification.

We also used the assume function to enforce our precondition {P}. Here, the precondition is that y != 0 to avoid division by zero. Adding assume(y != 0) ensures that we only test valid inputs.

With these pieces in place, we can now run the property-based test and let Hypothesis explore a wide range of inputs automatically:

test_div_property()
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[5], line 1
----> 1 test_div_property()

Cell In[4], line 8, in test_div_property()
      2 from hypothesis import strategies as st
      5 # Read this as:
      6 # given
      7 @given(
----> 8     # (a) one randomly drawn integer
      9     st.integers(),
     10     # (b) another randomly drawn integer
     11     st.integers(),
     12 )
     13 def test_div_property(x, y):
     14     # test this property
     15     # (c) reject y == 0
     16     assume(y != 0)
     17     # test that division succeeds

    [... skipping hidden 1 frame]

Cell In[4], line 18, in test_div_property(x, y)
     16 assume(y != 0)
     17 # test that division succeeds
---> 18 div(x, y)

Cell In[1], line 6, in div(x, y)
      4 assert y != 0           # P    (precondition)
      5 res = x / y             # code (implementation)
----> 6 assert res * y == x     # Q    (postcondition)
      7 return res

AssertionError: 
Falsifying example: test_div_property(
    x=7,
    y=55,
)

Running the above property-based test should reveal a pair of integers (x, y) that violate the division property. Recall, the inputs are generated randomly, so you might see different failing cases each time you run the test.

Shrinking#

Hypothesis automatically shrinks the input, step by step, until it finds the simplest failing example. This makes debugging much easier, since you don’t need to wade through massive random inputs.

Shrinking & test case generation is why you should use a property testing library. While you might be able to generate random inputs, shrinking and input space exploration is a hard problem!

Making your own strategies#

Combining strategies#

In the above example, we only tested integer division. However, we can also test floating-point division by using the st.floats() strategy from Hypothesis. Better yet, we can combine both strategies to test the div function with a wider range of inputs:

from hypothesis import assume, given
from hypothesis import strategies as st

# Combine integers and floats into a single strategy:
# `st.one_of` shrinks such that the earlier strategy is considered "simpler"
numbers = st.one_of(
    st.integers(),
    st.floats(allow_nan=False, allow_infinity=False),
)

Tip An important feature of strategies is that you can call .example to get an example generated value! This function really helps one create strategies.

numbers.example()
5961997760912302.0
@given(numbers, numbers)
def test_div_property(x, y):
    assume(y != 0)
    div(x, y)
test_div_property()
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[9], line 1
----> 1 test_div_property()

Cell In[8], line 2, in test_div_property()
      1 @given(numbers, numbers)
----> 2 def test_div_property(x, y):
      3     assume(y != 0)
      4     div(x, y)

    [... skipping hidden 1 frame]

Cell In[8], line 4, in test_div_property(x, y)
      1 @given(numbers, numbers)
      2 def test_div_property(x, y):
      3     assume(y != 0)
----> 4     div(x, y)

Cell In[1], line 6, in div(x, y)
      4 assert y != 0           # P    (precondition)
      5 res = x / y             # code (implementation)
----> 6 assert res * y == x     # Q    (postcondition)
      7 return res

AssertionError: 
Falsifying example: test_div_property(
    x=1,
    y=8.988465674311578e+307,
)

Other approaches#

Hypotheisis provides a lot more!

  1. st.builds to construct custom objects with arguments generated from strategies;

  2. st.sampled_from to sample from a list of choices

  3. st.just for always yielding one value (very useful when limiting search space)

  4. modify a strategy’s outputs using .filter and .map

  5. st.composite for more advanced generation (see example later)

See this page for more.

Controlling Hypothesis#

How many examples are generated#

By default, hypothesis generates 100 random input values. You can control this with the @settings decorator:

from hypothesis import given, settings
from hypothesis import strategies as st

ctr = 0


@settings(max_examples=10)
@given(st.integers())
def test_random(x):
    global ctr
    ctr += 1
    print(f"Test no. {ctr} with random input {x}")


test_random()
Test no. 1 with random input 0
Test no. 2 with random input -818
Test no. 3 with random input -3022168159676768025
Test no. 4 with random input -16816
Test no. 5 with random input -77
Test no. 6 with random input 11985
Test no. 7 with random input -24
Test no. 8 with random input -96
Test no. 9 with random input 2298008641686346705
Test no. 10 with random input 31947

In the above cell, we reduced max_examples from the default 100 to 10. In practice, one should consider increasing the max_examples parameter based on the complexity of the property being tested, particularly in the absence of failing cases when no counterexamples are found.

Observability#

Hypothesis caches failing examples for future checks. Let’s clear that

!rm -r .hypothesis

Now we’ll enable verbose output and run the test. Hypothesis will print out all the examples it is trying. This is usually good to do when developing tests to get a sense of the space being explored.

from hypothesis import Verbosity

@settings(verbosity=Verbosity.verbose)
@given(numbers, numbers)
def test_div_property(x, y):
    # comment & uncomment this line to invalidate the hypothesis cache.
    assume(y != 0)
    div(x, y)


test_div_property()
Trying example: test_div_property(
    x=0,
    y=0,
)
Trying example: test_div_property(
    x=0.0,
    y=0,
)
Trying example: test_div_property(
    x=2.220446049250313e-16,
    y=0,
)
Trying example: test_div_property(
    x=-7658,
    y=0,
)
Trying example: test_div_property(
    x=-8.068738610735183e+63,
    y=0,
)
Trying example: test_div_property(
    x=-6.753279163496331e+16,
    y=0,
)
Trying example: test_div_property(
    x=38,
    y=2.086046650819329e-132,
)
Trying example: test_div_property(
    x=29754,
    y=4,
)
Trying example: test_div_property(
    x=24783,
    y=-420264230,
)
Trying example: test_div_property(
    x=-1.2124046867423162e-101,
    y=14076,
)
Trying example: test_div_property(
    x=-26,
    y=16195,
)
Trying example: test_div_property(
    x=-7.382336043201933e-272,
    y=2_692_603_946_087_437_967,
)
Trying example: test_div_property(
    x=-1.842098636085868e+16,
    y=-32314,
)
Trying example: test_div_property(
    x=29045,
    y=-2_001_233_375,
)
Trying example: test_div_property(
    x=18,
    y=-5885066336975300.0,
)
Trying example: test_div_property(
    x=7277920628992536.0,
    y=-8571323423208478.0,
)
Trying example: test_div_property(
    x=7277920628992536.0,
    y=7277920628992536.0,
)
Trying example: test_div_property(
    x=-75,
    y=-2.5329626842103436e+16,
)
Trying example: test_div_property(
    x=0.0,
    y=-2.5329626842103436e+16,
)
Trying example: test_div_property(
    x=-2.5329626842103436e+16,
    y=-2.5329626842103436e+16,
)
Trying example: test_div_property(
    x=79,
    y=-85,
)
Trying example: test_div_property(
    x=-85,
    y=-85,
)
Trying example: test_div_property(
    x=520501743,
    y=4.634958721768219e+16,
)
Trying example: test_div_property(
    x=4.634958721768219e+16,
    y=4.634958721768219e+16,
)
Trying example: test_div_property(
    x=6.506895718169253e+16,
    y=2.1606565086673316e+16,
)
Trying example: test_div_property(
    x=6.506895718169253e+16,
    y=6.506895718169253e+16,
)
Trying example: test_div_property(
    x=1.6122976437473211e-12,
    y=-5e-324,
)
Traceback (most recent call last):
  File "/var/folders/f1/1h2dc57x5yg621dmd5y0gx1w0000gp/T/ipykernel_60913/4209601294.py", line 8, in test_div_property
    div(x, y)
  File "/var/folders/f1/1h2dc57x5yg621dmd5y0gx1w0000gp/T/ipykernel_60913/249070609.py", line 6, in div
    assert res * y == x     # Q    (postcondition)
           ^^^^^^^^^^^^
AssertionError

Trying example: test_div_property(
    x=-5e-324,
    y=-5e-324,
)
Trying example: test_div_property(
    x=1.6122976437473211e-12,
    y=1.6122976437473211e-12,
)
Trying example: test_div_property(
    x=-1.7976931348623157e+308,
    y=7436674208527465.0,
)
Trying example: test_div_property(
    x=-1.7976931348623157e+308,
    y=-1.7976931348623157e+308,
)
Trying example: test_div_property(
    x=-3.9641516231600954e-80,
    y=126,
)
Trying example: test_div_property(
    x=-3.9641516231600954e-80,
    y=-3.9641516231600954e-80,
)
Trying example: test_div_property(
    x=779,
    y=16567,
)
Trying example: test_div_property(
    x=16567,
    y=16567,
)
Trying example: test_div_property(
    x=1.612062017331883e-127,
    y=-2515,
)
Trying example: test_div_property(
    x=1.612062017331883e-127,
    y=0.0,
)
Trying example: test_div_property(
    x=1.612062017331883e-127,
    y=1.612062017331883e-127,
)
Trying example: test_div_property(
    x=1589112762884609.0,
    y=1e-05,
)
Trying example: test_div_property(
    x=1e-05,
    y=1e-05,
)
Trying example: test_div_property(
    x=21881,
    y=0.99999,
)
Trying example: test_div_property(
    x=0.0,
    y=0.99999,
)
Trying example: test_div_property(
    x=0.0,
    y=0.0,
)
Trying example: test_div_property(
    x=0.99999,
    y=0.99999,
)
Trying example: test_div_property(
    x=121,
    y=54,
)
Traceback (most recent call last):
  File "/var/folders/f1/1h2dc57x5yg621dmd5y0gx1w0000gp/T/ipykernel_60913/4209601294.py", line 8, in test_div_property
    div(x, y)
  File "/var/folders/f1/1h2dc57x5yg621dmd5y0gx1w0000gp/T/ipykernel_60913/249070609.py", line 6, in div
    assert res * y == x     # Q    (postcondition)
           ^^^^^^^^^^^^
AssertionError

Trying example: test_div_property(
    x=121,
    y=121,
)
Trying example: test_div_property(
    x=54,
    y=54,
)
Trying example: test_div_property(
    x=23224,
    y=1.4270812701223544e+16,
)
Trying example: test_div_property(
    x=0.0,
    y=1.4270812701223544e+16,
)
Trying example: test_div_property(
    x=1.4270812701223544e+16,
    y=1.4270812701223544e+16,
)
Trying example: test_div_property(
    x=-22464,
    y=-1.192092896e-07,
)
Trying example: test_div_property(
    x=-22464,
    y=0,
)
Trying example: test_div_property(
    x=0.0,
    y=-1.192092896e-07,
)
Trying example: test_div_property(
    x=-1.192092896e-07,
    y=-1.192092896e-07,
)
Trying example: test_div_property(
    x=121,
    y=54,
)
Traceback (most recent call last):
  File "/var/folders/f1/1h2dc57x5yg621dmd5y0gx1w0000gp/T/ipykernel_60913/4209601294.py", line 8, in test_div_property
    div(x, y)
  File "/var/folders/f1/1h2dc57x5yg621dmd5y0gx1w0000gp/T/ipykernel_60913/249070609.py", line 6, in div
    assert res * y == x     # Q    (postcondition)
           ^^^^^^^^^^^^
AssertionError

Trying example: test_div_property(
    x=121,
    y=0,
)
Trying example: test_div_property(
    x=0,
    y=54,
)
Trying example: test_div_property(
    x=54,
    y=121,
)
Trying example: test_div_property(
    x=121,
    y=1,
)
Trying example: test_div_property(
    x=121,
    y=22,
)
Trying example: test_div_property(
    x=121,
    y=27,
)
Traceback (most recent call last):
  File "/var/folders/f1/1h2dc57x5yg621dmd5y0gx1w0000gp/T/ipykernel_60913/4209601294.py", line 8, in test_div_property
    div(x, y)
  File "/var/folders/f1/1h2dc57x5yg621dmd5y0gx1w0000gp/T/ipykernel_60913/249070609.py", line 6, in div
    assert res * y == x     # Q    (postcondition)
           ^^^^^^^^^^^^
AssertionError

Trying example: test_div_property(
    x=121,
    y=13,
)
Traceback (most recent call last):
  File "/var/folders/f1/1h2dc57x5yg621dmd5y0gx1w0000gp/T/ipykernel_60913/4209601294.py", line 8, in test_div_property
    div(x, y)
  File "/var/folders/f1/1h2dc57x5yg621dmd5y0gx1w0000gp/T/ipykernel_60913/249070609.py", line 6, in div
    assert res * y == x     # Q    (postcondition)
           ^^^^^^^^^^^^
AssertionError

Trying example: test_div_property(
    x=121,
    y=6,
)
Trying example: test_div_property(
    x=121,
    y=11,
)
Trying example: test_div_property(
    x=121,
    y=12,
)
Trying example: test_div_property(
    x=121,
    y=-1,
)
Trying example: test_div_property(
    x=121,
    y=5,
)
Trying example: test_div_property(
    x=121,
    y=-5,
)
Trying example: test_div_property(
    x=121,
    y=-6,
)
Trying example: test_div_property(
    x=121,
    y=-11,
)
Trying example: test_div_property(
    x=121,
    y=-12,
)
Trying example: test_div_property(
    x=0,
    y=13,
)
Trying example: test_div_property(
    x=1,
    y=13,
)
Trying example: test_div_property(
    x=57,
    y=13,
)
Traceback (most recent call last):
  File "/var/folders/f1/1h2dc57x5yg621dmd5y0gx1w0000gp/T/ipykernel_60913/4209601294.py", line 8, in test_div_property
    div(x, y)
  File "/var/folders/f1/1h2dc57x5yg621dmd5y0gx1w0000gp/T/ipykernel_60913/249070609.py", line 6, in div
    assert res * y == x     # Q    (postcondition)
           ^^^^^^^^^^^^
AssertionError

Trying example: test_div_property(
    x=44,
    y=0,
)
Trying example: test_div_property(
    x=45,
    y=1,
)
Trying example: test_div_property(
    x=49,
    y=5,
)
Trying example: test_div_property(
    x=50,
    y=6,
)
Trying example: test_div_property(
    x=55,
    y=11,
)
Trying example: test_div_property(
    x=56,
    y=12,
)
Trying example: test_div_property(
    x=-44,
    y=0,
)
Trying example: test_div_property(
    x=-45,
    y=-1,
)
Trying example: test_div_property(
    x=-49,
    y=-5,
)
Trying example: test_div_property(
    x=-50,
    y=-6,
)
Trying example: test_div_property(
    x=-55,
    y=-11,
)
Trying example: test_div_property(
    x=-56,
    y=-12,
)
Trying example: test_div_property(
    x=25,
    y=13,
)
Trying example: test_div_property(
    x=28,
    y=13,
)
Trying example: test_div_property(
    x=55,
    y=13,
)
Trying example: test_div_property(
    x=56,
    y=13,
)
Trying example: test_div_property(
    x=-1,
    y=13,
)
Trying example: test_div_property(
    x=-25,
    y=13,
)
Trying example: test_div_property(
    x=-28,
    y=13,
)
Trying example: test_div_property(
    x=-55,
    y=13,
)
Trying example: test_div_property(
    x=-56,
    y=13,
)
Trying example: test_div_property(
    x=57,
    y=0,
)
Trying example: test_div_property(
    x=57,
    y=1,
)
Trying example: test_div_property(
    x=57,
    y=5,
)
Trying example: test_div_property(
    x=57,
    y=6,
)
Trying example: test_div_property(
    x=57,
    y=11,
)
Trying example: test_div_property(
    x=57,
    y=12,
)
Trying example: test_div_property(
    x=57,
    y=-1,
)
Trying example: test_div_property(
    x=57,
    y=-5,
)
Trying example: test_div_property(
    x=57,
    y=-6,
)
Trying example: test_div_property(
    x=57,
    y=-11,
)
Trying example: test_div_property(
    x=57,
    y=-12,
)
Trying example: test_div_property(
    x=56,
    y=14,
)
Trying example: test_div_property(
    x=56,
    y=0.0,
)
Trying example: test_div_property(
    x=13,
    y=57,
)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[12], line 11
      7     assume(y != 0)
      8     div(x, y)
---> 11 test_div_property()

Cell In[12], line 4, in test_div_property()
      1 from hypothesis import Verbosity
      3 @settings(verbosity=Verbosity.verbose)
----> 4 @given(numbers, numbers)
      5 def test_div_property(x, y):
      6     # comment & uncomment this line to invalidate the hypothesis cache.
      7     assume(y != 0)
      8     div(x, y)

    [... skipping hidden 1 frame]

Cell In[12], line 8, in test_div_property(x, y)
      3 @settings(verbosity=Verbosity.verbose)
      4 @given(numbers, numbers)
      5 def test_div_property(x, y):
      6     # comment & uncomment this line to invalidate the hypothesis cache.
      7     assume(y != 0)
----> 8     div(x, y)

Cell In[1], line 6, in div(x, y)
      4 assert y != 0           # P    (precondition)
      5 res = x / y             # code (implementation)
----> 6 assert res * y == x     # Q    (postcondition)
      7 return res

AssertionError: 
Falsifying example: test_div_property(
    x=57,
    y=13,
)

Back to the Heat Equation Solver#

Now let’s use Hypothesis to test our heat equation solver.

We’ll focus on invariants. These invariants are mathematical properties satisfied by the solution to the equation.

  • Conservation: total heat stays constant.

  • Telescoping property: The sum of the divergence over all cells equals the net flux through the boundaries.

  • Symmetry preservation: symmetric initial states stay symmetric under symmetric BCs.

  • Monotonicity: non-decreasing initial states remain non-decreasing.

Scaffolding#

Before encoding our properties, we need to set up some scaffolding to facilitate testing.

First, let’s define some strategies for generating sane floating-point numbers for our tests:

floats_st = st.floats(
    min_value=-1e3, max_value=1e3, allow_nan=False, allow_infinity=False
)
kappa_st = st.floats(
    min_value=1e-4, max_value=10, allow_nan=False, allow_infinity=False
)
dx_st = st.floats(min_value=1e-3, max_value=10, allow_nan=False, allow_infinity=False)
dt_st = st.floats(min_value=1e-6, max_value=10, allow_nan=False, allow_infinity=False)

Next, let’s define a mesh size strategy to generate reasonable and manageable mesh sizes, as well as a strategy for generating a tuple of boundary conditions.

# Mesh sizes
n_st = st.integers(min_value=3, max_value=40)

fluxes_st = st.floats(min_value=-100.0, max_value=100.0, allow_nan=False, allow_infinity=False)

# Boundary faces (qL, qR)
# generate 2 element tuples where each element is drawn from the strategy of floats
bc_st = st.tuples(fluxes_st, fluxes_st)

Remember you can use .example to check what’s generated.

bc_st.example()
(1.401298464324817e-45, 94.4568636837738)

Telescoping Property : “what goes out one cell must go in the next cell”#

Before testing overall conservation, let’s start one level deeper with the local flux balance that makes conservation possible.

In finite-volume discretizations, fluxes between neighboring cells telescope: the flux leaving one cell enters the next (except in the presence of sources, variable cell volumes or densities, or numerical errors.)

Consequently, when we sum the discrete divergence over all cells, the interior fluxes cancel pairwise, leaving only the boundary contributions. in other words, the total divergence equals the net flux through the boundaries:

\(\qquad \sum_{i=0}^{N-1} (\nabla \cdot F)_i = F_0 - F_N \qquad\)

Recall the encoding of this property:

from heat1d import vec, Mesh, step_heat_eqn

def telescoping(divF: vec, F: vec, dx: float) -> bool:
    """Check the finite volume telescoping property."""
    total_divergence = sum(divF) * dx
    boundary_flux = F[0] - F[-1]
    return total_divergence == approx(boundary_flux)

We are now ready to implement the symmetry property test using Hypothesis:

Exercise:#

Below is the Hypothesis test for the telescoping property. Insert the missing assumption(s) in the below test to ensure valid inputs for telescoping property using the assume function:

import heat1d

@settings(deadline=None)
@given(
    F=st.lists(floats_st, min_size=4, max_size=21),
    dx=dx_st,
)
def test_telescoping_property(F: vec, dx: float):
    """
    Property: for randomly generated vector `F` & spacing `dx`,
    the divergence divF using the function `divergence`,
    must satisfy the telescoping property.
    """
    # todo: compute divF using `heat1d.divergence`
    assert telescoping(divF, F, dx), "Telescoping property violated"
# test_telescoping_property()

Conservation#

Recall the conservation property specification:

from pytest import approx

def heat_is_conserved(u_old: vec, u_new: vec, dt: float, dx: float, bc: vec) -> bool:
    """Check if heat is conserved."""
    lhs = sum(u_new) * dx
    rhs = sum(u_old) * dx + dt * (bc[0] - bc[1])
    return lhs == approx(rhs)

Exercise:#

Similar to how we wrote a test function for the telescoping property, we can now implement a property-based test for conservation using Hypothesis. Write a property-based test function named test_conservation_property that checks if heat is conserved under arbitrary boundary conditions at one time step. Follow up: If this test passes, can we be certain that our solver conserves heat over multiple time steps? Why or why not?

Start with

def test_conservation_property(
    u_old: vec, kappa: float, dt: float, dx: float, bc: tuple[float, float]
):
    u = u_old.copy()
    mesh = Mesh(dx, N=len(u))

    step_heat_eqn(u_inout=u, kappa=kappa, dt=dt, mesh=mesh, bc=bc)

    assert heat_is_conserved(u_old, u, dt, dx, bc), "Conservation property violated"

and write out a given decorator that tests the conservation property. You’ll need to write a strategy that generates the initial state u_old: vec as a list of floats.

# TODO: Implement test_conservation_property using Hypothesis

Passing the one-step conservation property confirms that the solver preserves total heat under insulated boundaries for a single update. In exact arithmetic and insulated boundaries at every step, this one-step law would imply conservation over all steps by induction because conservation is a loop invariant:

loop invariant: a property that remains true before and after every iteration of the timestepping loop.

In other words, total heat being conserved is an invariant of the update rule. In real code, however, floating-point round-off and truncation errors in the spatial and temporal discretizations can accumulate over time, breaking perfect invariance. So, while the property shows the solver is formally conservative per step, true long-term conservation depends on floating-point precision and numerical stability.

Symmetry#

We can test the symmetry property of the heat equation solver by checking if the solution remains symmetric when the initial conditions are mirrored and boundary conditions are applied symmetrically. To check this property, let’s first define a helper function to generate symmetric initial conditions.

Feature Spotlight: The st.builds() method can be used to build complex data structures by combining simpler strategies. Below, we use it to create symmetric arrays.

Notice how the strategy always generates symmetric lists.

def symmetric_field(min_n=3, max_n=31):
    # Produce palindromic arrays
    # (odd length preferred for strict central symmetry)
    # Build half then mirror
    half = st.lists(floats_st, min_size=(min_n // 2), max_size=(max_n // 2))
    center = floats_st
    return st.builds(lambda h, c: h + [c] + list(reversed(h)), half, center)


symmetric_field().example()
[0.0, 275.01402811786534, 0.0]
@given(symmetric_field(), dx_st, kappa_st, dt_st, floats_st)
def test_symmetry_preservation(u0, dx, kappa, dt, q):
    n = len(u0)
    # This invariant is satisfied by the generator
    assert n >= 3 and n % 2 == 1  # keep a clear center

    mesh = Mesh(dx=dx, N=n)

    u = u0[:]  # copy to update
    step_heat_eqn(u, kappa, dt, mesh, bc=[q, -q])

    # palindromic within FP tolerance
    assert all(l == approx(r) for l, r in zip(u, reversed(u)))
test_symmetry_preservation()
print("Test completed")
Test completed

Monotonicity#

Diffusion smooths.

One way to express this intuition discretely is through monotonicity preservation: if a temperature profile is nondecreasing at the start of a step, it should remain so after the update. In other words, diffusion should not create new inversions.

This property is a discrete analogue of the maximum principle: if a scheme never inverts local order, it can’t create new extrema either.

Below we encode the monotonicity property using Hypothesis. We generate random nondecreasing initial profiles, advance one step, and check that ordering is preserved.

def is_monotone_nondecreasing(u: vec, atol=1e-10) -> bool:
    """Check if the list u is non-decreasing within a tolerance."""
    return all(u[i] <= u[i + 1] + atol for i in range(len(u) - 1))


def nondecreasing_lists(min_size=3, max_size=40):
    """Uses `.map` to sort the list"""
    return st.lists(
        st.floats(min_value=0.0, max_value=10.0), min_size=min_size, max_size=max_size
    ).map(lambda l: sorted(l))


def nondecreasing_lists_alternate(min_size=3, max_size=40):
    """Generate non-decreasing lists of floats strategy.

    Uses st.builds and a cumulative sum to build a non-decreasing list
    """
    deltas = st.lists(
        st.floats(min_value=0.0, max_value=10.0), min_size=min_size, max_size=max_size
    )
    base = floats_st
    return st.builds(
        lambda b, ds: [b + sum(ds[: i + 1]) for i in range(len(ds))], base, deltas
    )


@given(u0=nondecreasing_lists(), dt=dt_st, dx=dx_st, kappa=kappa_st)
def test_monotonicity_preserved_one_step(u0, dt, dx, kappa):
    n = len(u0)
    assume(n >= 3)
    assume(kappa * dt / (dx**2) < 0.5)  # stability condition
    mesh = Mesh(dx=dx, N=n)
    bc = (0.0, 0.0)  # insulated

    u = u0.copy()
    step_heat_eqn(u, kappa, dt, mesh, bc)

    assert is_monotone_nondecreasing(u), "Monotonicity violated"
test_monotonicity_preserved_one_step()
print("Test completed")
Test completed

Beware of assume#

Avoid assume if you can. It can be an inefficient way to explore the state space.

assume(kappa * dt / (dx**2) < 0.5)

since kappa, dt and dx are independently sampled, a large fraction of the search space may be invalid.

Instead adapt your generators to satisfy any necessary preconditions. Next we will discuss some more advanced strategy creation methods.

Advanced test-case generation with st.composite#

Very frequently, we will want to combine multiple strategies to generate complex inputs that satisfy certain conditions.

for example: assume(kappa * dt / (dx**2) < 0.5)  # stability condition

There are two solutions here, both of which involve you drawing explicit values and making decisions or controlling further draws based on those values.

  1. Using @given(data=st.data, ...) and then data.draw in the test

  2. Using the st.composite decorator to define a composite strategy.

st.data()#

def dts(*, min_value=1e-6, max_value=10) -> float:
    return st.floats(
        min_value=min_value, max_value=max_value, allow_nan=False, allow_infinity=False
    )


@given(data=st.data(), N=n_st, dx=dx_st, kappa=kappa_st)
def test_monotonicity_preserved_one_step_v2(data, N, dx, kappa):
    assert isinstance(N, int)
    assert isinstance(dx, float)
    
    mesh = Mesh(dx=dx, N=N)

    # Now we ensure that we create initial conditions of the right size!
    u0: list[float] = data.draw(nondecreasing_lists(min_size=N, max_size=N))
    assert isinstance(u0, list)
    assert len(u0) == N

    # Now we ensure that we pick a `dt` that always satisfies the stability condition
    dt: float = data.draw(dts(min_value=1e-10, max_value=0.49 * dx**2 / kappa))
    assert isinstance(dt, float)

    # stability condition is always satisfied by the `max_value` passed to `dts` strategy
    assert kappa * dt / (mesh.dx**2) <= 0.5  
    bc = (0.0, 0.0)  # insulated

    u = u0.copy()
    step_heat_eqn(u, kappa, dt, mesh, bc)

    assert is_monotone_nondecreasing(u), "Monotonicity violated"


test_monotonicity_preserved_one_step_v2()
print("Test completed")
Test completed

st.composite()#

This is a little more convoluted and not a great fit for our current problem but composite is extremely useful in real-world use-cases. The approach is very similar; the most important difference is that composite let’s you build strategies that take strategies as inputs (“higher-order strategies”)

import math


def dxs(*, min_value=1e-3, max_value=10) -> float:
    return st.floats(
        min_value=min_value, max_value=max_value, allow_nan=False, allow_infinity=False
    )


@st.composite
def meshes(draw: st.DrawFn, *, kappas=kappa_st, dts=dts, dxs=dxs, ns=n_st):
    # `draw` is a function that lets you draw a concrete value
    # from a strategy like so:
    kappa = draw(kappas)

    MIN_DT = 1e-6

    # now we pick a dx, dt such that the stability condition is *always* satisfied
    # 10% fudge factor to account for floating point resolution near MIN_DT
    dx = draw(dxs(min_value=1 * math.sqrt(2.1 * kappa * MIN_DT)))
    dt = draw(dts(min_value=MIN_DT, max_value=0.49 * dx**2 / kappa))

    # assert the postcondition
    # (always test your strategies!)
    assert (kappa * dt / (dx**2)) < 0.5, kappa * dt / (dx**2)

    # the need to return multiple values here suggests 
    # this approach isn't a good for for this particular problem.
    return kappa, dt, Mesh(dx=dx, N=draw(ns))


meshes().example()
(2.0896231194763653, 1.7334953900394796, Mesh(dx=3.0714195825772723, N=12))

Here is an example of a complex strategy to generate Zarr arrays. This strategy is public APIcomposite is nice for building backward-compatible public strategies.

What we just did#

Property-based testing lets you codify computational, scientific, and mathematical properties as executable claims:

  • We did prep-work: defined strategies for generating valid inputs.

  • We encoded properties: telescoping, conservation, symmetry, monotonicity

  • Hypothesis searched broad input spaces and shrank counterexamples when things went wrong.

  • This directly operationalizes the scientific method for software: state a hypothesis (property), attempt refutation (generation + shrinking), refine code or specs.

Some lessons to take away#

  1. Property tests don’t replace other testing strategies.

  2. Property tests pseudo-randomly sample the input search space. This means it is upon you to narrow the search space to “interesting” regions.

  3. Simple properties are really effective.

  4. Property testing suites are effectively an empirical specification of your program’s behaviour.

  5. Your unit test is most likely a property test in disguise. Generalize them!

  6. When you find a bug, construct a property test to catch it!

Further reading#

  1. John Hughes. How to specify it! [Paper], [Talk] ← highly recommended!

  2. Hillel Wayne. Finding property tests

  3. https://www.hillelwayne.com/post/cleverness/

  4. https://fsharpforfunandprofit.com/posts/property-based-testing-2/#categories-for-properties

Looking Ahead#

In Chapter 5, we’ll step across the next rung of the rigor ladder: theorem proving—automating reasoning over all inputs within a finite domain and turning some of these properties into machine-checked contracts.


R3Sw tutorial by Alper Altuntas (NSF NCAR). Guest lecture by Deepak Cherian (Earthmover). Sponsored by the BSSw Fellowship Program. © 2025.

Cite as: Alper Altuntas, Deepak Cherian, Adrianna Foster, Manish Venumuddula, and Helen Kershaw. (2025). “Rigor and Reasoning in Research Software (R3Sw) Tutorial.” Retrieved from https://www.alperaltuntas.com/R3Sw