Best Practices
This guide collects patterns that work well in practice and mistakes to avoid. Read the Spec Syntaxexplains and Alloy Modelsexplains chapters first.
Document Structure
A well-written spec serves two audiences at once: it is a user manual that humans read to understand the system, and an executable specification that a machine runs to verify the system works. Neither role is secondary — if it cannot be read as documentation, it is a poor spec; if it cannot be executed, it is just prose.
Lead with prose and design rationale. Introduce Alloy models where structural properties matter. Follow with executable blocks and check tables that confirm implementation.
Right Level of Detail
Executable blocks should verify acceptance criteria, not exhaustive edge cases. A spec that reads like a unit-test suite has gone too far — specdown is not a replacement for unit tests. Write enough cases to demonstrate the contract and catch regressions at the boundary level; leave fine-grained coverage to the project's test framework.
Keep Documents Focused
One spec file should cover one feature or one bounded concern. Split by feature boundary, not by test type. Do not put "all Alloy models" in one file and "all executable blocks" in another. The value of specdown is that model and implementation verification live together.
Alloy and Implementation: Complementary Roles
| Aspect | Alloy model | Executable block / check table |
|---|---|---|
| Abstraction | Design level | Implementation level |
| Coverage | Exhaustive within scope | Selected examples |
| Executes | Mathematical model | Actual code via adapter |
| Finds | Design flaws, missing constraints | Implementation bugs, integration issues |
Neither replaces the other. The power is in combining them in the same document.
What Alloy Provides
Alloy's value comes from exhaustive exploration within a bounded scope. The solver finds valid instances, counterexamples, and logical relationships that are invisible to example-based testing alone.
Verification
- Valid instances —
rungenerates concrete instances satisfying the model's constraints, confirming the model is consistent (not vacuously true). Arun sanityCheck {}should accompany every model. - Counterexample search —
checkautomatically finds assertion violations. Counterexamples are saved as artifacts for debugging. - Exhaustive coverage within scope — executable blocks test selected examples; Alloy proves properties for every combination of atoms up to the given scope.
Logical Relationship Discovery
- Equivalence — proving two models equivalent lets you reuse existing tests after a refactor (see Equivalence Shield).
- Implication — proving a strong invariant implies weaker properties eliminates the need to test those weaker properties separately (see Invariant Leverage).
- Vacuous satisfaction detection — if facts are contradictory, every assertion passes trivially. Alloy's instance finder exposes this trap.
Document Quality
- Design intent made precise — an Alloy model forces prose claims into formal statements. Ambiguous prose becomes a concrete predicate.
- Prose errors surfaced — when a property claimed in prose fails in the model, the counterexample reveals the inconsistency immediately.
- Missing constraints discovered — when an executable block finds a bug, adding the missing constraint to the model lets Alloy search for further violations (see Failure-Driven Modeling).
Test Optimization
- Exhaustive classification — proving a case partition is complete and mutually exclusive means one representative per case suffices (see Exhaustive Classification).
- Impossible transitions proved — modeling state machines lets Alloy prove which transitions cannot occur, reducing the invalid-path tests needed (see Transition Safety Net).
- Composition coverage — individual modules are tested with executable blocks; Alloy covers the combinatorial space of their interaction (see Composition Safety).
Patterns
1. Property and Implementation Side by Side
Place an alloy:model block with a check statement and a check table in the same section. Readers see both the design guarantee and the implementation confirmation together. When a check exists in the model block, the HTML report links the result automatically — no alloy:ref needed.
Here is a minimal example — an Alloy model proves that every item has an owner, and a check table confirms the implementation enforces the same rule:
module ownership
sig User {}
sig Item { owner: one User }
assert everyItemHasOwner {
all i: Item | one i.owner
}
check everyItemHasOwner for 5
pred sanityCheck {}
run sanityCheck {} for 52. Counterexample Harvesting
When Alloy finds a counterexample, fix the model, then add the counterexample as a check row to prevent regression. Document the counterexample in prose so future readers know why the row exists.
3. Exhaustive Classification
Use Alloy to prove the set of cases is complete and mutually exclusive, then test one representative per case. Two assertions — one for completeness, one for mutual exclusivity — together guarantee that the check table covers every case with minimal rows.
Here a permission model partitions access into three levels. Alloy proves every subject falls into exactly one level.
module classify
abstract sig Level {}
one sig Admin, Member, Guest extends Level {}
sig Subject { level: one Level }
pred isAdmin[s: Subject] { s.level = Admin }
pred isMember[s: Subject] { s.level = Member }
pred isGuest[s: Subject] { s.level = Guest }
-- completeness: every subject is one of the three
assert complete {
all s: Subject | isAdmin[s] or isMember[s] or isGuest[s]
}
-- mutual exclusivity: no overlap
assert exclusive {
no s: Subject | (isAdmin[s] and isMember[s])
or (isMember[s] and isGuest[s])
or (isAdmin[s] and isGuest[s])
}
check complete for 5
check exclusive for 5
pred sanityCheck {}
run sanityCheck {} for 5With this proof, a check table needs only one representative per level (Admin, Member, Guest) — Alloy guarantees there are no gaps.
4. Invariant Leverage
Prove that one strong invariant implies several weaker properties. Then only test the strong invariant in implementation checks and skip the rest. Document why the weaker tests are absent — the Alloy proofs serve as the justification.
5. Transition Safety Net
Model state transitions in Alloy to prove which transitions are impossible. Executable blocks test the valid paths and a minimal set of invalid ones. No need to test every impossible pair; Alloy covers the rest.
See the State Machine Models section in the Alloy spec for a worked example.
6. Composition Safety
Test each module individually with executable blocks. Use Alloy to verify that their combination does not open unintended gaps. Testing every combination of states is impractical — Alloy covers the combinatorial space.
7. Failure-Driven Modeling
When an executable block discovers a bug, add the missing constraint to the Alloy model. Then let Alloy search for further violations. The cycle: implementation failure reveals a missing assumption, model is strengthened, Alloy finds more cases, new check rows are added.
8. Equivalence Shield
When refactoring, prove the old and new models are equivalent in Alloy. Then reuse existing executable blocks and check tables unchanged to confirm the implementation matches.
Alloy Pitfalls
Vacuous satisfaction
If the model's facts are contradictory, every assertion passes trivially. Always include pred sanityCheck {} run sanityCheck {} for 5 to confirm the model has at least one instance. If it finds no instance, the model is inconsistent and all check results are meaningless.
Missing always in temporal models
In temporal Alloy (models with var fields), fact and assert bodies are evaluated in the initial state only. To express a constraint that must hold in every state, wrap the body with always.
Missing var when using prime
The prime operator (e') refers to the value of e in the next state. Without var, the field is static and e' = e always holds, making the assertion vacuously true.
Scope too small
check ... for 3 searches only instances with up to 3 atoms per signature. Properties that only fail with 4+ interacting elements will not be caught. Use a scope of 5-7 as a practical default. For temporal models, also set an adequate step bound (e.g. check ... for 5 but 8 steps).
Anti-Patterns
- Model without implementation checks — Alloy proves design properties but code may still be wrong. Always pair models with executable blocks or check tables.
- Implementation checks without rationale — future readers cannot tell which rows are essential. Add prose explaining why each case matters.
- Alloy in a separate file — defeats the purpose. Model and implementation checks should share the same section and prose context.
- Over-modeling — simple CRUD does not need Alloy. Use it when the state space is large enough that example-based testing cannot cover it.
- Over-testing — a spec crammed with exhaustive cases becomes unreadable as documentation. Keep executable blocks at the acceptance-criteria level and leave fine-grained coverage to unit tests.
- Hardcoded paths in configexplains — use relative paths so the project works from any checkout location.
- Monolithic adapterexplains — keep adapters focused on one execution environment. Split when complexity grows.
Hooks vs Inline Cleanup
Use > setup / > teardown hooks when the same preparation or cleanup
applies to all cases in a section. Teardown hooks always run, even if a
case fails — inline cleanup at the start of the next block does not have
this guarantee.
Use > setup:each / > teardown:each when every child section needs
identical preparation (e.g., resetting a database or clearing temp files).
Inline cleanup at the start of a block is simpler when the cleanup is specific to one case. Avoid mixing both in the same section — pick one and use it consistently.
When to Build an Adapter
Stay with run:shell when commands are one-off and vary between cases.
Build an adapter when:
- Three or more shell blocks repeat the same execution pattern with different inputs
- You need domain-specific checks (database state, UI content, API responses)
- Shell blocks grow complex with parsing (
awk/grepchains to extract values)
A check table with an adapter reads as data; a shell block reads as a script. The tradeoff is the upfront cost of writing the adapter.
An adapter also improves readability. Summary lines help collapse long shell blocks, but the best spec is one that reads clearly even without them. When shell blocks need heavy parsing or boilerplate to express an assertion, that complexity is a signal: build an adapter so the blocks (or check tables) speak the language of the domain, not the language of the shell.
Choosing the Right Block Style
| Goal | Style |
|---|---|
| Verify command output matches exactly | Doctest ($ lines with expected output) |
| Run a command and check exit status only | Plain run:shell block |
| Test many input/output pairs | Check table (> check:name) |
| Assert a value inline in prose | `expect: ${n} == 3` |
| Quick spot-check with an adapter | Inline `check:jq(...)` |
Doctest blocks are a natural fit for CLI specs — the reader sees the
command and its output together. Use ... wildcards for lines that
change between runs (timestamps, PIDs, paths).
Inline expect: is useful when the assertion belongs in the narrative —
e.g. `expect: ${total} == 42` renders as a green/red badge
in the HTML report without breaking the flow into a separate block.
Choosing the Right Tool
| Situation | Tool |
|---|---|
| Property must hold for all combinations | Alloy |
| API returns the right response | Executable block |
| Multiple input/output pairs to check | Check table |
| Refactoring safety | Alloy equivalence + existing checks |
| State reachability | Alloy |
| End-to-end workflow | Executable blocks in sequence |