StoryProof

Your coding agent builds.
Ours verifies.

Derives what should be true from your spec. Tests what actually is. Shows you the gap.

$ curl -fsSL https://storyproof.pages.dev/install.sh | sh

Works with Claude Code, Codex, Cursor, Windsurf — or just you ;)

“The builder and the inspector should never share the same blind spots.”

— Eran Kahana, Stanford Law School CodeX
01

What slips through

Spec drift

Your agent built something close to what was asked. But “close” ships silent failures. The subscription status updates but the end date doesn’t. Not a crash. Not a test failure. Just... not right.

87%

of defects get zero valid tests from AI agents. The edge cases your agent never thought of don’t have tests because they’re not in the code.

ASE 2024

100% / 4%

Test coverage vs faults caught. AI-generated suites execute every line but catch 4% of possible bugs. Green CI, zero confidence.

Wang et al.

“We are replacing validation with transcription.”

— David Adamo Jr.
02

Why your coding agent shouldn’t verify its own work

Coding agents today
StoryProof
General-purpose — writes code, docs, tests, everything
Purpose-built verification agent with a dedicated testing methodology
Derives tests from the code it just wrote — “replacing validation with transcription”
Derives acceptance criteria from the spec, not the code. Different starting point, different blind spots.
Same agent writes code and tests — “builder and inspector share the same blind spots”
Independent agent that never saw the implementation. Catches what the builder assumed was obvious.
Scores three risk dimensions per behavior. Unit for edges, integration for boundaries, E2E for UI. Picks the leanest proof at the right layer.
Generates tests that pass — 100% coverage, 4% of faults caught
Reads assertions, not test names. If the assertion doesn’t prove the behavior, it’s flagged as a gap.
87% of defects get zero valid tests — edge cases the agent never considered don’t get tested
Derives 16 behaviors from a one-line spec — including the 13 nobody asked for. Edge cases, error paths, boundary conditions.

Coding agents are extraordinary at building. Verification needs a different kind of thinking.

“The single biggest differentiator between agentic engineering and vibe coding is testing.”

— Addy Osmani, Google Chrome
Evidence

Evidence, not opinions

Coding agents say “looks correct.” StoryProof produces proof you can inspect, replay, and ship to CI.

Right layer for the risk

Unit test for edge cases and input validation. Integration test with real database for boundary crossings. Playwright browser test for JavaScript-dependent UI.

No mocks where the risk is real. A test that mocks the database doesn’t prove the database works.

Runtime proof, not static analysis

Every verdict is backed by a test that ran. Failed = confirmed bug. Passed = proven behavior. Not “the code looks right to me.”

Static inspection isn’t evidence. StoryProof runs the test and records the output.

Evidence that stays in your CI

Tests land in your repo, run on every future PR. Verification carries from your local machine to CI pipeline automatically.

Your agent’s confidence dies with the session. StoryProof’s evidence survives in your CI — the bug can never come back.

03

One spec. 14 derived behaviors. The right test for each.

Step 1

Extract behaviors

One sentence: “Add Stripe webhook for subscription renewals.” StoryProof derives 14 acceptance criteria — happy path, signature verification, idempotency, edge cases. Your agent thought of 3.

Step 2

Score the risk

Each behavior gets three risk scores: edge-case density, boundary crossings, user-visible rendering. These scores determine which kind of test can settle the question.

Step 3

Pick the leanest proof

Unit test for input validation. Integration test with real Stripe SDK for webhook signatures. Browser test for the checkout-to-subscription flow. No over-testing, no under-testing.

04

Real verification — full cycle

“Add Stripe webhook handler for subscription renewals”
WebhookController, StripeService, SubscriptionRepository. 2 unit tests — both pass. CI green.
14 acceptance criteria derived:
  • ✓ Webhook receives valid event (covered)
  • ✗ Invalid signature → 401 (your agent never tested this)
  • ✗ Subscription end_date actually extends (code updates status, not date — spec drift)
  • ? Customer downgrades mid-renewal (edge case)
  • ? Duplicate webhook / idempotency (edge case)
  • … 9 more behaviors
  • 8 unit tests — edge cases, validation, error paths
  • 1 integration test — real webhook POST with Stripe signature verification
  • 1 e2e test — checkout → payment → subscription active in UI
CAUGHT: end_date never updated. Status changes, date doesn’t. Renewals would expire silently.
10 new tests stay in your repo. Running in CI on every future PR. The webhook bug can never come back.
05

This is what you see.

Real terminal output from storyproof check on a Stripe webhook PR.

StoryProof Check — DO NOT SHIP
## Likely defects — code analysis suggests these will break
[AC004] Webhook handler updates subscription.status but not subscription.end_date
Expected: end_date extends by billing period on successful renewal
Prove will: Run integration test against Stripe webhook endpoint
[AC007] No signature verification — accepts any POST to /webhooks/stripe
Expected: Returns 401 for invalid Stripe-Signature header
## Needs proof — no evidence covers these behaviors
[AC009] Duplicate webhook handling — no idempotency key check
[AC011] Customer downgrades mid-renewal — old plan webhook fires
## Already covered by existing evidence
8 behaviors verified
Next: storyproof prove — settle all 6 unproven behaviors
$ storyproof check --spec "your change" finds the gaps
$ storyproof prove fills them with evidence
check prove fix prove ship
$ curl -fsSL https://storyproof.pages.dev/install.sh | sh
Read the docs