Claim Evidence State

This file is generated from the claim packet and status snapshot. Do not edit it by hand. Raw claim evidence state stays in the claim packet; this page is the Evidence State projection for human reading.

Source Of Truth

Claims packet: .cautilus/claims/evidenced-typed-runners.json
Claims hash: sha256:9b7373cd0c4ffb963b01100e93f0b7d1142c40b43252991f6953022ed6c51449
Status snapshot: .cautilus/claims/status-summary.json
Status hash: sha256:b42ebda56b6a700bd0a2814e7407b15c5053351eaabef90aa9b7c81159e724d4
Git state: fresh; stale=no
Snapshot inspected commit: 8fd624a9cec5618f3fb0f05ef54dfe7bf87bbbc5
Packet commit: 8fd624a9cec5618f3fb0f05ef54dfe7bf87bbbc5
Changed claim sources: 0
Claims packet role: audit source for candidates, labels, evidence status, and count totals
Status snapshot role: derived command snapshot for git state, action buckets, and cross-cutting signals; its claimSummary must match the claim packet

Scoreboard

Dimension	Counts
Evidence	satisfied: 149, unknown: 235
Recommended proof	cautilus-eval: 115, deterministic: 166, human-auditable: 103
Proof readiness	blocked: 36, needs alignment: 37, ready for proof: 311
Review	agent-reviewed: 204, heuristic: 178, human-reviewed: 2

Cautilus Eval Backlog

Queue	Count
open Cautilus eval claims	110
ready for proof	110
needs scenario	0

Ready for proof means the claim is concrete enough to attach or create the selected proof now; it does not mean a scenario fixture already exists. Needs scenario means the claim is still too broad, abstract, or surface-ambiguous for honest eval planning and must first be decomposed into one or more observable scenarios.

By Surface

Surface	Count
(none)	6
app/chat	4
app/prompt	9
dev/repo	68
dev/skill	23

Proof-Ready Samples

Claim	Source	Surface	Readiness	Review	Summary
claim-readme-md-92	README.md:92	dev/skill	ready for proof	heuristic	`Cautilus` turns the fixture run into durable eval packets that another agent or maintainer can reopen.
claim-readme-md-103	README.md:103	dev/skill	ready for proof	heuristic	That turned "did the agent read and follow the repo instructions?" from transcript judgment into a reproducible packet with artifacts another maintainer can reopen.
claim-readme-md-117	README.md:117	dev/skill	ready for proof	heuristic	Evaluation uses two top-level surfaces: `dev` for AI-assisted development work such as repo contracts, tools, and skills, and `app` for AI-powered product behavior such as chat, prompt, and service responses.
claim-readme-md-130	README.md:130	app/chat	ready for proof	agent-reviewed	`Cautilus` treats the context-recovery case as a protected scenario kept out of tuning so the signal stays honest.
claim-docs-contracts-adapter-contract-md-213	docs/contracts/adapter-contract.md:213	dev/repo	ready for proof	heuristic	When an eval run uses `runtime=product`, the adapter-owned command is expected to exercise a headless product path; the runtime label does not make product proof ready without a current runner assessment.
claim-docs-contracts-adapter-contract-md-541	docs/contracts/adapter-contract.md:541	dev/skill	ready for proof	heuristic	Use `--codex-home-mode isolated` when the eval should not load the operator's `CODEX_HOME` config, plugins, or sessions.
claim-docs-guides-cli-md-55	docs/guides/cli.md:55	dev/repo	ready for proof	heuristic	For `codex_exec`, `--codex-home-mode isolated` keeps user config and session state out of the eval while `--codex-auth-mode inherit` copies only Codex auth into the isolated home.
claim-docs-guides-cli-md-319	docs/guides/cli.md:319	dev/skill	ready for proof	heuristic	`cautilus evaluate skill-experiment` emits `cautilus.skill_clone_experiment_report.v1` with `variant_ran`, baseline-versus-variant delta, rubric match, source coverage delta, isolation notes, and a promotion recommendation.

Scenario Samples

No scenario-sample Cautilus eval claims currently require scenario decomposition.

Action Buckets

Bucket	Actor	Count	Evidence	Review	Meaning
already-satisfied	none	149	satisfied: 149	agent-reviewed: 148, human-reviewed: 1	Proof is already attached and valid under packet semantics.
agent-add-deterministic-proof	agent	20	unknown: 20	agent-reviewed: 1, heuristic: 18, human-reviewed: 1	Add or connect unit, lint, build, schema, spec, or CI proof.
agent-plan-cautilus-eval	agent	110	unknown: 110	agent-reviewed: 8, heuristic: 102	Draft or select Cautilus eval scenarios for proof-ready eval claims.
human-align-surfaces	human	37	unknown: 37	agent-reviewed: 17, heuristic: 20	Reconcile conflicting docs, code, adapters, or ownership boundaries before proof would be honest.
human-confirm-or-decompose	human	32	unknown: 32	heuristic: 32	Confirm, decompose, or accept a human-auditable claim before treating it as proven.
split-or-defer	human	36	unknown: 36	agent-reviewed: 30, heuristic: 6	Split broad, historical, provider-caveated, policy-like, or otherwise blocked claims before verification.

Cross-Cutting Signals

Signal	Actor	Count	Meaning
heuristic-review-needed	agent	178	Review heuristic labels before spending proof or eval budget.

How This Avoids A Split SOT

The claim packet is the audit source.
The status snapshot is regenerated from that packet before this projection is rendered.
This page is checked by npm run claims:evidence-state:check and npm run verify.
Manual proof maps still curate product-level evidence routes; they should link here rather than copying raw claim backlog counts.