HTML Report Surface

Cautilus ships static HTML views so a human can answer one practical question in a browser: what happened in this evaluation, and what should I trust or revisit next?

The public contract on this page is the currently shipped renderer surface, not a design backlog.

Today that surface includes:

  • report packets
  • review packets
  • review summary packets
  • scenario proposal packets
  • scenario conversation review packets
  • evidence bundles
  • self-dogfood latest bundles
  • run index pages

These pages are generated from JSON packets or checked-in artifact bundles. They are read-only representations of the source packet, not a second editable source of truth. The visual language does not have to match specdown exactly. The contract is simpler: a reviewer should be able to open the generated page, understand what artifact it represents, and find the key verdict or summary.

What A Reviewer Can Learn

The current HTML surface should let a reviewer answer questions like:

  • What was the intent behind this run?
  • Did the candidate improve, regress, or still need human judgment?
  • Which layer is currently carrying the decision, and which layers are supporting context?
  • Which proposal, finding, or evidence signal should I inspect first?
  • Is this a current run artifact, a published self-dogfood snapshot, or a summary across review variants?

Packet Renderer Proof

Render report, review, scenario review, proposals, and evidence packets into standalone HTML pages.
tmpdir=$(mktemp -d) ./bin/cautilus report build --input ./fixtures/reports/report-input.json --output "$tmpdir/report.json" >/dev/null ./bin/cautilus review prepare-input --repo-root . --report-file "$tmpdir/report.json" --output "$tmpdir/review.json" >/dev/null ./bin/cautilus scenario review-conversations --input ./fixtures/scenario-conversation-review/input.json --output "$tmpdir/conversation-review.json" >/dev/null ./bin/cautilus scenario propose --input ./fixtures/scenario-proposals/standalone-input.json --output "$tmpdir/proposals.json" >/dev/null ./bin/cautilus report render-html --input "$tmpdir/report.json" --output "$tmpdir/report.html" >/dev/null ./bin/cautilus review render-html --input "$tmpdir/review.json" --output "$tmpdir/review.html" >/dev/null ./bin/cautilus scenario render-conversation-review-html --input "$tmpdir/conversation-review.json" --output "$tmpdir/conversation-review.html" >/dev/null ./bin/cautilus scenario render-proposals-html --input "$tmpdir/proposals.json" --output "$tmpdir/proposals.html" >/dev/null ./bin/cautilus evidence render-html --input ./fixtures/evidence/example-bundle.json --output "$tmpdir/evidence.html" >/dev/null grep -q '<title>Cautilus Report — defer</title>' "$tmpdir/report.html" grep -q '<title>Cautilus Review Packet — defer</title>' "$tmpdir/review.html" grep -q '<title>Cautilus Scenario Conversation Review — 2</title>' "$tmpdir/conversation-review.html" grep -q '<title>Cautilus Scenario Proposals — 1</title>' "$tmpdir/proposals.html" grep -q '<title>Cautilus Evidence Bundle — 5 signals</title>' "$tmpdir/evidence.html" grep -q 'The operator should understand why a workflow step failed and how to recover.' "$tmpdir/report.html" grep -q 'Decision Signals' "$tmpdir/report.html" grep -q 'Does the current deterministic self-consumer gate stay honest about what it actually proves for the product repo?' "$tmpdir/review.html" grep -q 'Review Path' "$tmpdir/review.html" grep -q 'review_existing_scenario_refresh' "$tmpdir/conversation-review.html" grep -q 'Selection Signals' "$tmpdir/conversation-review.html" grep -q 'Refresh review-after-retro scenario from recent activity' "$tmpdir/proposals.html" grep -q 'Selection Signals' "$tmpdir/proposals.html" grep -q 'Regressed evidence: operator-recovery' "$tmpdir/evidence.html" grep -q 'Signals By Source' "$tmpdir/evidence.html"

Bundle And Index Proof

Regenerate the published self-dogfood page and a run index page from an existing artifact directory.
tmpdir=$(mktemp -d) ./bin/cautilus self-dogfood render-html --latest-dir ./artifacts/self-dogfood/latest --output "$tmpdir/self-dogfood.html" >/dev/null ./bin/cautilus artifacts render-index-html --run-dir ./artifacts/self-dogfood/latest --output "$tmpdir/index.html" >/dev/null grep -q '<title>Cautilus Self-Dogfood — pass</title>' "$tmpdir/self-dogfood.html" grep -q '<title>Cautilus Run Index — latest</title>' "$tmpdir/index.html" grep -q 'Decision Summary' "$tmpdir/self-dogfood.html" grep -q 'What happened' "$tmpdir/self-dogfood.html" grep -q 'Cautilus should record and surface its own self-dogfood result honestly before operators trust broader consumer runs.' "$tmpdir/self-dogfood.html" grep -q 'Artifacts are ordered by the intended review flow' "$tmpdir/index.html"

Review Summary Proof

The review summary renderer expects a cautilus.review_summary.v1 packet. This page keeps a minimal inline sample so the public proof stays current without depending on a historical artifact snapshot.

Render a minimal review summary packet into standalone HTML.
tmpdir=$(mktemp -d) printf '%s\n' '{"schemaVersion":"cautilus.review_summary.v1","generatedAt":"2026-04-16T00:00:00Z","status":"passed","reviewVerdict":"concern","reasonCodes":["RC_NO_BENCHMARK_EVIDENCE"],"findingsCount":1,"telemetry":{"variantCount":2,"passedVariantCount":2,"failedVariantCount":0,"durationMs":12000},"variants":[{"id":"codex-review-a","status":"passed","durationMs":6000,"output":{"verdict":"concern","summary":"Evidence is promising but shallow.","findings":[{"severity":"concern","message":"needs more held_out evidence","path":"docs/specs/review.spec.md"}]}},{"id":"codex-review-b","status":"passed","durationMs":6000,"output":{"verdict":"pass","summary":"Second reviewer is satisfied.","findings":[]}}],"humanReviewFindings":[{"severity":"concern","message":"needs more held_out evidence","path":"docs/specs/review.spec.md"}]}' > "$tmpdir/review-summary.json" ./bin/cautilus review render-variants-summary-html --input "$tmpdir/review-summary.json" --output "$tmpdir/review-summary.html" >/dev/null grep -q '<title>Cautilus Review Summary — concern</title>' "$tmpdir/review-summary.html" grep -q 'Execution aligned, but verdicts diverged across variants.' "$tmpdir/review-summary.html"