Claim Evidence State

This file is generated from the claim packet and status snapshot. Do not edit it by hand. Raw claim evidence state stays in the claim packet; this page is the Evidence State projection for human reading.

Source Of Truth

  • Claims packet: .cautilus/claims/evidenced-typed-runners.json
  • Claims hash: sha256:9b7373cd0c4ffb963b01100e93f0b7d1142c40b43252991f6953022ed6c51449
  • Status snapshot: .cautilus/claims/status-summary.json
  • Status hash: sha256:b42ebda56b6a700bd0a2814e7407b15c5053351eaabef90aa9b7c81159e724d4
  • Git state: fresh; stale=no
  • Snapshot inspected commit: 8fd624a9cec5618f3fb0f05ef54dfe7bf87bbbc5
  • Packet commit: 8fd624a9cec5618f3fb0f05ef54dfe7bf87bbbc5
  • Changed claim sources: 0
  • Claims packet role: audit source for candidates, labels, evidence status, and count totals
  • Status snapshot role: derived command snapshot for git state, action buckets, and cross-cutting signals; its claimSummary must match the claim packet

Scoreboard

Dimension Counts
Evidence satisfied: 149, unknown: 235
Recommended proof cautilus-eval: 115, deterministic: 166, human-auditable: 103
Proof readiness blocked: 36, needs alignment: 37, ready for proof: 311
Review agent-reviewed: 204, heuristic: 178, human-reviewed: 2

Cautilus Eval Backlog

Queue Count
open Cautilus eval claims 110
ready for proof 110
needs scenario 0

Ready for proof means the claim is concrete enough to attach or create the selected proof now; it does not mean a scenario fixture already exists. Needs scenario means the claim is still too broad, abstract, or surface-ambiguous for honest eval planning and must first be decomposed into one or more observable scenarios.

By Surface

Surface Count
(none) 6
app/chat 4
app/prompt 9
dev/repo 68
dev/skill 23

Proof-Ready Samples

Claim Source Surface Readiness Review Summary
claim-readme-md-92 README.md:92 dev/skill ready for proof heuristic Cautilus turns the fixture run into durable eval packets that another agent or maintainer can reopen.
claim-readme-md-103 README.md:103 dev/skill ready for proof heuristic That turned "did the agent read and follow the repo instructions?" from transcript judgment into a reproducible packet with artifacts another maintainer can reopen.
claim-readme-md-117 README.md:117 dev/skill ready for proof heuristic Evaluation uses two top-level surfaces: dev for AI-assisted development work such as repo contracts, tools, and skills, and app for AI-powered product behavior such as chat, prompt, and service responses.
claim-readme-md-130 README.md:130 app/chat ready for proof agent-reviewed Cautilus treats the context-recovery case as a protected scenario kept out of tuning so the signal stays honest.
claim-docs-contracts-adapter-contract-md-213 docs/contracts/adapter-contract.md:213 dev/repo ready for proof heuristic When an eval run uses runtime=product, the adapter-owned command is expected to exercise a headless product path; the runtime label does not make product proof ready without a current runner assessment.
claim-docs-contracts-adapter-contract-md-541 docs/contracts/adapter-contract.md:541 dev/skill ready for proof heuristic Use --codex-home-mode isolated when the eval should not load the operator's CODEX_HOME config, plugins, or sessions.
claim-docs-guides-cli-md-55 docs/guides/cli.md:55 dev/repo ready for proof heuristic For codex_exec, --codex-home-mode isolated keeps user config and session state out of the eval while --codex-auth-mode inherit copies only Codex auth into the isolated home.
claim-docs-guides-cli-md-319 docs/guides/cli.md:319 dev/skill ready for proof heuristic cautilus evaluate skill-experiment emits cautilus.skill_clone_experiment_report.v1 with variant_ran, baseline-versus-variant delta, rubric match, source coverage delta, isolation notes, and a promotion recommendation.

Scenario Samples

No scenario-sample Cautilus eval claims currently require scenario decomposition.

Action Buckets

Bucket Actor Count Evidence Review Meaning
already-satisfied none 149 satisfied: 149 agent-reviewed: 148, human-reviewed: 1 Proof is already attached and valid under packet semantics.
agent-add-deterministic-proof agent 20 unknown: 20 agent-reviewed: 1, heuristic: 18, human-reviewed: 1 Add or connect unit, lint, build, schema, spec, or CI proof.
agent-plan-cautilus-eval agent 110 unknown: 110 agent-reviewed: 8, heuristic: 102 Draft or select Cautilus eval scenarios for proof-ready eval claims.
human-align-surfaces human 37 unknown: 37 agent-reviewed: 17, heuristic: 20 Reconcile conflicting docs, code, adapters, or ownership boundaries before proof would be honest.
human-confirm-or-decompose human 32 unknown: 32 heuristic: 32 Confirm, decompose, or accept a human-auditable claim before treating it as proven.
split-or-defer human 36 unknown: 36 agent-reviewed: 30, heuristic: 6 Split broad, historical, provider-caveated, policy-like, or otherwise blocked claims before verification.

Cross-Cutting Signals

Signal Actor Count Meaning
heuristic-review-needed agent 178 Review heuristic labels before spending proof or eval budget.

How This Avoids A Split SOT

  • The claim packet is the audit source.
  • The status snapshot is regenerated from that packet before this projection is rendered.
  • This page is checked by npm run claims:evidence-state:check and npm run verify.
  • Manual proof maps still curate product-level evidence routes; they should link here rather than copying raw claim backlog counts.