Integrating SMT Solving into Climate Model Configuration

Our JOSS paper on visualCaseGen was recently published. The paper, as is the JOSS tradition, is concise: focused on the software’s purpose and impact rather than the engineering behind it. In this blog post, I want to pull back the curtain and talk about the ideas that I’m most excited about: how we applied concepts from formal verification and satisfiability modulo theories (SMT) solving to an unlikely domain, climate model configuration, and the data structures and algorithms that make it all work.

The Problem: Configuring a Climate Model Shouldn’t Take Weeks

The Community Earth System Model (CESM) is one of the most comprehensive climate models in the world. It couples together models of the atmosphere, ocean, land, ice, rivers, biogeochemistry, and waves to simulate the full Earth system. This flexibility is its greatest strength, and its greatest usability challenge.

Setting up a non-standard CESM experiment may quickly become cumbersome. You need to select which component models to couple, choose physics configurations for each, pick compatible grids, set parameterization options, and ensure that all of these choices are mutually consistent. The dependencies between these choices are deep and often non-obvious. A researcher might spend weeks manually editing code, creating input files, consulting domain experts, and running trial-and-error iterations before arriving at a working configuration.

A study by Wu et al. (2021) is a perfect example. They wanted to set up two idealized aquaplanet simulations: one with no continents and another with a simple pole-to-pole land strip to study atmosphere-ocean interactions. Despite the conceptual simplicity, the technical setup required extensive manual modifications to CESM’s codebase, custom input file creation, and countless debugging iterations.

We built visualCaseGen to eliminate this pain. It’s a Jupyter-based GUI that guides researchers through each stage of CESM experiment setup, automates key configuration tasks, and, uniquely, uses an SMT solver to enforce compatibility constraints in real time. To my knowledge, it’s the first tool to integrate SMT-based constraint solving into Earth system model configuration.

My Background: From Formal Verification to Climate Software

Before I dive into the technical details, some context on why I reached for an SMT solver in the first place. My background and interests span formal verification and software engineering for scientific computing. Formal verification, i.e., the art of using mathematical logic to prove that systems behave correctly, has always fascinated me. The field has produced remarkably powerful tools, such as SMT solvers, that can reason about complex logical relationships involving integers, reals, booleans, strings, and more.

When I started working on the CESM configuration problem, I realized that what modelers were doing manually was essentially solving a constraint satisfaction problem (CSP). They had a set of variables (component choices, grid selections, physics options), a set of domains for each variable (the valid choices), and a set of constraints (compatibility rules). They were solving this CSP by hand, through trial and error. The connection to formal methods was immediate and obvious to me: this is exactly the kind of problem that SMT solvers are designed to handle.

The question became: could I bring the rigor of formal verification to a domain where it had never been applied? The answer turned out to be a resounding yes. And the result was more elegant than I had initially hoped.

The SMT Solver: Z3 at the Heart of visualCaseGen

At the core of visualCaseGen’s backend is an SMT-based constraint solver built on Z3, the renowned theorem prover developed at Microsoft Research. Z3 was a natural choice: it has a robust Python API, supports a rich variety of theories (integers, reals, booleans, strings), and is battle-tested in both academia and industry for tasks ranging from software verification to program synthesis.

Constraints as First-Class Citizens

In visualCaseGen, constraints are specified declaratively as a dictionary of Z3 logical expressions paired with human-readable error messages. The key is a Z3 BoolRef expression defining the logical condition; the value is the error message to display when the constraint is violated. Here are three examples of increasing complexity:

# Simple bound constraint
LND_DOM_PFT >= 0.0:
    "PFT/CFT must be set to a nonnegative number",

# Conditional constraint (implication)
Implies(OCN_GRID_EXTENT == "Regional", OCN_CYCLIC_X == "False"):
    "Regional ocean domain cannot be reentrant (due to an ESMF limitation.)",

# Multi-component logical rule
Implies(And(COMP_OCN == "mom", COMP_LND == "slnd", COMP_ICE == "sice"),
        OCN_LENY < 180.0):
    "If LND and ICE are stub, custom MOM6 grid must exclude poles "
    "to avoid singularities in open water",

Both domain scientists and software engineers can read and even author these constraints. The constraints capture real physical and computational restrictions: that a regional ocean domain can’t be reentrant due to an ESMF framework limitation, or that an ocean grid without land or ice coverage must avoid the poles to prevent numerical singularities. The Z3 expressions use familiar logical operators (And, Or, Not, Implies), and the accompanying error messages provide immediate, actionable feedback.

In the actual codebase, the relational constraints module defines around 40+ constraints spanning component compatibility, grid validity, forcing requirements, and more. They range from simple assertions like “at least one component must be active” to intricate multi-variable implications involving ocean grid dimensions, atmospheric forcing types, and land surface properties.

Why Not Just Use If-Else Statements?

A fair question. Why reach for an SMT solver when you could just write a bunch of conditional checks? The answer lies in the combinatorial explosion of the configuration space and the qualitative difference in what a solver can do:

Detecting Hidden Conflicts: Individual constraints may each be satisfiable on their own, yet their combination can lead to conflicts that are extremely difficult to detect manually. An SMT solver reasons about all constraints simultaneously and can identify these emergent conflicts.
Preventing Dead-Ends: Without a solver, users might unknowingly make a sequence of individually valid choices that collectively lead to an unsatisfiable state, i.e., a dead end with no valid completion. The solver proactively eliminates such dead-end options before the user encounters them, by dynamically computing which options remain valid at each step.
Constraint Analysis: The solver can answer meta-questions about the constraint system itself: Are all constraints satisfiable? Are there options that are unreachable under any combination of other choices? Are any constraints redundant? This is invaluable for both debugging the constraint set and for providing users with the most informative feedback.
Scalability: CESM’s configuration space is large and growing. Manually checking compatibility across 40+ interrelated constraints and dozens of variables would be infeasible to maintain. The solver scales naturally as new constraints and variables are added.
Error Explanation via Unsatisfiable Cores: When a user attempts an invalid assignment, the solver doesn’t just say “no.” It extracts the unsatisfiable core (the minimal subset of constraints that are mutually contradictory given the current assignments) and presents the associated error messages. This is a direct application of a key formal verification technique to end-user error reporting.

Here’s the relevant code from the retrieve_error_msg method of the CspSolver class:

def retrieve_error_msg(self, var, new_value):
    with Solver() as s:
        s.set(":core.minimize", True)
        # Apply all past and current assertions...
        s.add(var == new_value)

        # Track each relational constraint with its error message
        for constr in self._relational_constraints:
            s.assert_and_track(constr, self._relational_constraints[constr])

        # Extract the minimal set of violated constraints
        error_messages = [str(err_msg) for err_msg in s.unsat_core()]
        # ...format and return the error message

The assert_and_track method is key here: It associates each constraint with its human-readable label so that when Z3 computes the unsatisfiable core, we get back exactly the error messages that explain why the assignment is invalid. The :core.minimize option asks Z3 to find a minimal unsatisfiable subset, so the user gets the most concise explanation possible.

The Stage Mechanism: Structuring Configuration as a DAG

Climate model configuration isn’t a free-form process. There’s a natural ordering. You need to choose your components before you can select physics options; you need to know your physics before you can pick grids; you need grids before you can launch. The Stage Mechanism formalizes this sequencing.

Stages as a Tree

Each stage groups a set of related configuration variables (ConfigVar instances) that the user sets together. Stages are organized in a tree hierarchy: a stage can have child stages, and children can be guarded, i.e., activated only when a Z3 boolean condition evaluates to true given the current configuration state.

For example, the top-level flow is:

1. Component Set  ->  2. Grid  ->  3. Launch

But within “Component Set,” the path branches:

If the user chooses Standard mode, they follow a path through support level filtering and compset alias selection.
If the user chooses Custom mode, they walk through time period selection, component picking, physics configuration, and component options.

These branches are implemented as Guard nodes: nodes in the stage tree that carry a Z3 condition. When the stage mechanism needs to determine which child to activate, it asks the CSP solver to evaluate each guard’s condition against the current assignments:

def check_condition(self):
    """Check if the condition of the node is satisfied."""
    return csp.check_expression(self._condition)

This is a beautiful interplay between the stage hierarchy and the solver. The stage structure defines what to check; the solver determines whether the condition holds.

The Stage Pipeline as a DAG

All possible stage paths collectively form the stage pipeline. A critical invariant is that this pipeline must be a directed acyclic graph (DAG). Why? Because the pipeline defines variable precedence: variables in earlier stages have higher priority than those in later stages. If the pipeline had cycles, you’d get circular precedences, which would make constraint propagation undefined.

The DAG requirement also means that while the same variable can appear in multiple stages (e.g., the land grid variable appears in both the “Standard Land Grid” and “Modified Land Grid” stages), it must never be reachable along the same path. This is enforced structurally by the guard mechanism. If two stages share a variable, they must be behind mutually exclusive guards.

Variable Rank Determination: SMT Solving the Stage Structure Itself

Here’s one of my favorite technical details. One that didn’t make it into the JOSS paper. To establish variable precedence, the CSP solver doesn’t just assume a fixed ordering. Instead, it uses Z3 again, this time to compute a consistent ranking of all variables from the stage tree structure.

The _determine_variable_ranks method performs a full DFS traversal of the stage tree and encodes ordering requirements as Z3 integer constraints:

def _determine_variable_ranks(self, stage, cvars):
    # Solver to check if a consistent ranking is possible
    s = Solver()

    # Create rank variables for each config variable
    [Int(f"{var}_rank") for var in cvars]
    max_rank = Int("max_rank")

    while stage is not None:
        varlist = stage._varlist
        curr_rank = Int(f"{varlist[0]}_rank")

        # All ranks must be nonneg and ≤ max_rank
        s.add([0 <= curr_rank, curr_rank <= max_rank])

        # All variables in the same stage have the same rank
        for var in varlist[1:]:
            s.add(curr_rank == Int(f"{var}_rank"))

        # Variables in earlier stages must have lower ranks
        true_next_stage = ...  # determined by tree traversal
        s.add(curr_rank < Int(f"{true_next_stage._varlist[0]}_rank"))

        # Guard variables must rank below current stage but above the guarded stage
        for guard_var in guard_vars:
            s.add(Int(f"{guard_var}_rank") < Int(f"{next_stage._varlist[0]}_rank"))

        # Verify consistency at each step
        if s.check() == unsat:
            raise RuntimeError("Inconsistent variable ranks encountered.")

        stage = dfs_next_stage

    # Minimize the maximum rank for optimal performance
    opt = Optimize()
    opt.add(s.assertions())
    opt.minimize(max_rank)
    opt.check()
    model = opt.model()

Notice the use of Optimize() at the end. This is Z3’s optimization extension. After verifying that a consistent ranking exists, it minimizes the maximum rank, yielding the most compact ranking possible. This is an optimization that reduces the depth of the constraint graph and potentially improves traversal performance.

This is a meta-application of SMT solving: using the solver not just to validate user configurations, but to validate and optimize the tool’s own internal data structures. If a developer defines stages with conflicting precedence requirements, the system catches it at initialization time with a clear error, rather than manifesting as subtle bugs at runtime.

The Constraint Graph: Real-Time Propagation

With variables ranked and constraints loaded, visualCaseGen constructs a constraint graph—the data structure that enables real-time, incremental constraint propagation as the user makes selections.

Graph Construction

The constraint graph is a directed graph where:

Nodes are configuration variables (ConfigVar instances).
Directed edges represent constraint dependencies, pointing from higher-precedence variables to lower-precedence variables.

The construction logic is straightforward: for each relational constraint, extract all the variables it involves (using Z3’s z3util.get_vars()), then add directed edges from each variable to every other variable in the constraint that has an equal or higher rank (i.e., equal or lower precedence):

def _process_relational_constraints(self, cvars):
    self._cgraph = {var: set() for var in cvars.values()}

    for constr in self._relational_constraints:
        self._solver.add(constr)

        constr_vars = {cvars[var.sexpr()] for var in z3util.get_vars(constr)}

        for var in constr_vars:
            self._cgraph[var].update(
                set(
                    var_other
                    for var_other in constr_vars
                    if var_other is not var and var_other.rank >= var.rank
                )
            )

The key insight is the directionality: edges only point from higher-precedence to lower-precedence variables. This means that when a user changes a high-precedence variable (like a component selection), the effects propagate downward through the graph to dependent variables (like grid options), but not upward. This respects the stage ordering and prevents cascading updates from propagating in unintended directions.

Breadth-First Traversal for Option Validity Refresh

When a user makes a selection, the constraint graph is traversed breadth-first starting from the assigned variable. The traversal visits each neighboring variable, re-evaluates its option validities by querying the solver, and, crucially, only continues the traversal to further neighbors if the validities actually changed:

def _refresh_options_validities(self, var):
    queue = [neig for neig in self._cgraph[var] if neig.has_options()]
    queued = {var} | set(queue)

    while queue:
        var = queue.popleft()

        if not var.update_options_validities():
            continue
        
        for neig in self._cgraph[var]:
            if neig not in queued and neig.has_options():
                queue.append(neig)
                queued.add(neig)                    

This is where the performance magic happens. The traversal is lazy: It stops propagating along any branch where the validities didn’t change. In practice, most user selections only affect a small subset of the total configuration variables. A component selection might invalidate a few grid options but leave dozens of other variables untouched. The lazy BFS ensures that only the truly affected variables are re-evaluated, keeping the GUI responsive even as the constraint set grows.

How Option Validity is Computed

For each variable with a finite set of options, the solver checks satisfiability for each candidate value:

def get_options_validities(self, var):
    with self._solver as s:
        self.apply_assignment_assertions(s, exclude_var=var)
        self.apply_options_assertions(s, exclude_vars=[var])
        new_validities = {
            opt: s.check(var == opt) == sat
            for opt in var._options
        }
    return new_validities

For each option, Z3 is asked: “Given all current assignments and all other option constraints, is it satisfiable to set this variable to this value?” If the answer is unsat, the option is marked invalid and displayed with a cross-out in the GUI. If the user clicks on a crossed-out option, the tool retrieves the unsatisfiable core to explain why it’s incompatible.

This is the key loop that connects the solver to the user experience. Every time the user makes a choice, the constraint graph is traversed, validities are recomputed, and the GUI updates in real time by crossing out options that have become incompatible and uncrossing options that have become available.

ProConPy: A Reusable Library

It’s worth noting that the constraint solving, stage mechanism, and constraint graph are not hardcoded into visualCaseGen. They live in a separate library I created called ProConPy (an SMT-based Product Configuration library in Python). ProConPy is domain-agnostic. It provides the ConfigVar, Stage, Guard, and CspSolver classes, along with utilities like OptionsSpec for declaring dynamic option dependencies.

The visualCaseGen application uses ProConPy by:

Defining ConfigVar instances for CESM-specific variables (components, grids, physics options, etc.).
Defining Stage and Guard instances that organize these variables into the CESM configuration workflow.
Specifying relational constraints as Z3 expressions.
Calling csp.initialize() to wire everything together: computing variable ranks, building the constraint graph, and loading constraints into the solver.

This separation means that ProConPy could be used for any product configuration problem that can be modeled as a staged CSP, and not just climate model setup. Product configurators are common in manufacturing, e-commerce, and systems engineering, and they all share the same fundamental challenge of ensuring compatibility across interdependent choices.

The Frontend: Making It All Tangible

All of this backend machinery would be useless without a good frontend. visualCaseGen’s GUI is built with Jupyter ipywidgets, which means it can run on local machines, HPC clusters, and cloud environments. Basically, anywhere Jupyter runs.

The GUI implements a wizard-like interface where each stage is presented as a panel. As the user makes selections, incompatible options are visually crossed out with a ❌ symbol, while valid options are presented normally. Clicking on a crossed-out option triggers the error explanation mechanism described above, displaying a popup with the specific constraint(s) being violated.

The stage mechanism controls the flow: completing one stage automatically activates the next, and the guard conditions determine which branch of the pipeline to follow. Users can also go back to previous stages. The solver supports a revert() operation that pops the most recent set of assertions and restores the solver to its prior state.

Beyond configuration, visualCaseGen includes specialized widgets like the TopoEditor for interactively modifying ocean bathymetry, and the fsurdat modifier for customizing land surface properties. These tools further reduce the need for manual file editing and make the system truly end-to-end.

Reflections: Formal Methods in Unexpected Places

Building visualCaseGen reinforced my belief that formal methods and SMT solving have enormous untapped potential outside their traditional domains of hardware/software verification. Climate model configuration is, at its core, a constraint satisfaction problem. By recognizing this and reaching for the right tool, that is, a mature, efficient SMT solver, I was able to build a system that is not only more user-friendly but also more correct than any amount of hand-written validation logic could achieve.

Some takeaways:

Z3’s Python API is clean and expressive. If you can write boolean expressions, you can use Z3. The barrier to adoption is awareness, not complexity.
By expressing constraints as Z3 expressions with human-readable error messages, the constraint set becomes both machine-checkable and human-auditable. Domain scientists can review and contribute to the constraints without understanding the solver internals.
Using Z3 to compute variable ranks from the stage tree (essentially verifying the tool’s own internal consistency) is a pattern I find elegant and powerful.
The breadth-first, change-driven traversal of the constraint graph ensures that the system remains responsive even as the number of variables and constraints grows.

To my knowledge, visualCaseGen is the first tool to integrate SMT-based constraint solving into Earth system model configuration, and it demonstrates how formal methods can aid and simplify complex scientific software workflows. If you work in a domain with complex configuration requirements, and I suspect many scientific software projects qualify, I encourage you to consider whether an SMT solver might be the right tool for the job. You might be surprised at how naturally it fits.

visualCaseGen is open source and available on GitHub. The JOSS paper is published here. This work was supported by the NSF CSSI program under award 2004575.