Evaluation Surfaces And Runners
Evaluation uses explicit surfaces, presets, fixtures, and runners.
Map keys: promise.evaluation, rule.packet-freshness, rule.cost-and-proof-freshness, rule.host-owned-execution.
Evidence path: deterministic plus fixture-backed eval.
Evidence status: open gap.
Next action: keep normalizer, fixture, and public spec proof aligned with dev/repo, dev/skill, app/chat, and app/prompt.
Terms covered here: dev surface, app surface, repo preset, skill preset, chat preset, prompt preset, fixture composition, runner readiness, evaluate fixture, evaluate observation, scenario normalization.
Maintainer Promise
The top-level evaluation surfaces are dev and app, the shipped presets are repo, skill, chat, and prompt, and fixtures declare their surface and preset so the reader can tell what kind of behavior is under test.
Subclaims
evaluate fixtureandevaluate observationkeep a uniform CLI shape across all four shipped presets.- Each shipped preset is fixture-backed so a reviewer can reopen the input, observed output, and summary packets.
- Fixtures declare their surface and preset; mismatched declarations fail rather than silently routing to a different preset.
- Scenario normalizers and proposal-input pipelines stay aligned with the shipped surface vocabulary.
Evidence
- User-facing evaluation surface smoke is enforced by docs/specs/user/evaluation.spec.md (specdown directives over
evaluate fixture --help). - Per-preset fixture-backed summary packet evidence is preserved in charness-artifacts/spec/evaluation-surfaces-runners-proof.md, which records selected fields and source hashes from the ignored self-dogfood output paths without making those generated paths direct spec links.
- internal/runtime/evaluation_input_test.go
TestNormalizeEvaluationInputRejectsCrossAxisCombo,RejectsUnsupportedSurface, andRejectsUnsupportedPresetenforce that mismatched surface/preset declarations fail rather than silently routing. npm run lint:scenario-normalizersproves runtime completeness of the survivingdiscover scenarios normalizehelpers via scripts/check-scenario-normalization-completeness.mjs.