HTML Report Surface

Cautilus ships static HTML views so a human can answer one practical question in a browser: what happened in this evaluation, and what should I trust or revisit next?

The public contract on this page is the currently shipped renderer surface, not a design backlog.

Today that surface includes:

report packets
review packets
review summary packets
scenario proposal packets
scenario conversation review packets
evidence bundles
self-dogfood latest bundles
run index pages

These pages are generated from JSON packets or checked-in artifact bundles. They are read-only representations of the source packet, not a second editable source of truth. The visual language does not have to match specdown exactly. The contract is simpler: a reviewer should be able to open the generated page, understand what artifact it represents, and find the key verdict or summary.

What A Reviewer Can Learn

The current HTML surface should let a reviewer answer questions like:

What was the intent behind this run?
Did the candidate improve, regress, or still need human judgment?
Which layer is currently carrying the decision, and which layers are supporting context?
Which proposal, finding, or evidence signal should I inspect first?
Is this a current run artifact, a published self-dogfood snapshot, or a summary across review variants?

Packet Renderer Proof

Render report, review, scenario review, proposals, and evidence packets into standalone HTML pages.

tmpdir=$(mktemp -d)
./bin/cautilus report build --input ./fixtures/reports/report-input.json --output "$tmpdir/report.json" >/dev/null
./bin/cautilus review prepare-input --repo-root . --report-file "$tmpdir/report.json" --output "$tmpdir/review.json" >/dev/null
./bin/cautilus scenario review-conversations --input ./fixtures/scenario-conversation-review/input.json --output "$tmpdir/conversation-review.json" >/dev/null
./bin/cautilus scenario propose --input ./fixtures/scenario-proposals/standalone-input.json --output "$tmpdir/proposals.json" >/dev/null
./bin/cautilus report render-html --input "$tmpdir/report.json" --output "$tmpdir/report.html" >/dev/null
./bin/cautilus review render-html --input "$tmpdir/review.json" --output "$tmpdir/review.html" >/dev/null
./bin/cautilus scenario render-conversation-review-html --input "$tmpdir/conversation-review.json" --output "$tmpdir/conversation-review.html" >/dev/null
./bin/cautilus scenario render-proposals-html --input "$tmpdir/proposals.json" --output "$tmpdir/proposals.html" >/dev/null
./bin/cautilus evidence render-html --input ./fixtures/evidence/example-bundle.json --output "$tmpdir/evidence.html" >/dev/null
grep -q '<title>Cautilus Report — defer</title>' "$tmpdir/report.html"
grep -q '<title>Cautilus Review Packet — defer</title>' "$tmpdir/review.html"
grep -q '<title>Cautilus Scenario Conversation Review — 2</title>' "$tmpdir/conversation-review.html"
grep -q '<title>Cautilus Scenario Proposals — 1</title>' "$tmpdir/proposals.html"
grep -q '<title>Cautilus Evidence Bundle — 5 signals</title>' "$tmpdir/evidence.html"
grep -q 'The operator should understand why a workflow step failed and how to recover.' "$tmpdir/report.html"
grep -q 'Decision Signals' "$tmpdir/report.html"
grep -q 'Does the current deterministic self-consumer gate stay honest about what it actually proves for the product repo?' "$tmpdir/review.html"
grep -q 'Review Path' "$tmpdir/review.html"
grep -q 'review_existing_scenario_refresh' "$tmpdir/conversation-review.html"
grep -q 'Selection Signals' "$tmpdir/conversation-review.html"
grep -q 'Refresh review-after-retro scenario from recent activity' "$tmpdir/proposals.html"
grep -q 'Selection Signals' "$tmpdir/proposals.html"
grep -q 'Regressed evidence: operator-recovery' "$tmpdir/evidence.html"
grep -q 'Signals By Source' "$tmpdir/evidence.html"

run:shell

Bundle And Index Proof

Regenerate the published self-dogfood page and a run index page from an existing artifact directory.

tmpdir=$(mktemp -d)
./bin/cautilus self-dogfood render-html --latest-dir ./artifacts/self-dogfood/latest --output "$tmpdir/self-dogfood.html" >/dev/null
./bin/cautilus artifacts render-index-html --run-dir ./artifacts/self-dogfood/latest --output "$tmpdir/index.html" >/dev/null
grep -q '<title>Cautilus Self-Dogfood — pass</title>' "$tmpdir/self-dogfood.html"
grep -q '<title>Cautilus Run Index — latest</title>' "$tmpdir/index.html"
grep -q 'Decision Summary' "$tmpdir/self-dogfood.html"
grep -q 'What happened' "$tmpdir/self-dogfood.html"
grep -q 'Cautilus should record and surface its own self-dogfood result honestly before operators trust broader consumer runs.' "$tmpdir/self-dogfood.html"
grep -q 'Artifacts are ordered by the intended review flow' "$tmpdir/index.html"

run:shell

Review Summary Proof

The review summary renderer expects a cautilus.review_summary.v1 packet. This page keeps a minimal inline sample so the public proof stays current without depending on a historical artifact snapshot.

Render a minimal review summary packet into standalone HTML.

tmpdir=$(mktemp -d)
printf '%s\n' '{"schemaVersion":"cautilus.review_summary.v1","generatedAt":"2026-04-16T00:00:00Z","status":"passed","reviewVerdict":"concern","reasonCodes":["RC_NO_BENCHMARK_EVIDENCE"],"findingsCount":1,"telemetry":{"variantCount":2,"passedVariantCount":2,"failedVariantCount":0,"durationMs":12000},"variants":[{"id":"codex-review-a","status":"passed","durationMs":6000,"output":{"verdict":"concern","summary":"Evidence is promising but shallow.","findings":[{"severity":"concern","message":"needs more held_out evidence","path":"docs/specs/review.spec.md"}]}},{"id":"codex-review-b","status":"passed","durationMs":6000,"output":{"verdict":"pass","summary":"Second reviewer is satisfied.","findings":[]}}],"humanReviewFindings":[{"severity":"concern","message":"needs more held_out evidence","path":"docs/specs/review.spec.md"}]}' > "$tmpdir/review-summary.json"
./bin/cautilus review render-variants-summary-html --input "$tmpdir/review-summary.json" --output "$tmpdir/review-summary.html" >/dev/null
grep -q '<title>Cautilus Review Summary — concern</title>' "$tmpdir/review-summary.html"
grep -q 'Execution aligned, but verdicts diverged across variants.' "$tmpdir/review-summary.html"

run:shell