// preview room / claude code lab

Your Agent Graded Its Own Exam: Why CI Passes but Production Breaks

packaged

0:00/0:00

DLDownload thumbnail

Cost ledger

trends$0.109

idea$0.041

hook$0.131

script$0.049

storyboard$0.160

factcheck$0.186

qa$0.061

package$0.019

total$0.757

QA Council

8.0/10

specificity

utility

technical validity

visual clarity

brand fit

anti slop

platform safety

Hook 'graded its own exam' is original and sticky — earns attention without hype
Core mechanism is precisely named: same context window → mocks reflect the code, not the contract. This is the rare insight that feels earned, not summarized from a blog post
TypeScript jest.mock block is syntactically valid and the focusLines (3,4,10) correctly target the fabricated return value — technical credibility holds
s4 before/after is the visual anchor and it works: 'fetchUser(validId)' vs 'null, expired token, 404, rate limit' is a concrete contrast, not a vague claim
CTA is actionable but slightly underspecified — 'write tests before the agent touches implementation' is the right directive but doesn't say what that looks like in Cursor/Claude Code (failing test file first, then agent fills impl). One more sentence would close the loop for the target audience
s6 warning scene is strong on copy but broll 'server rack red alert blinking' is generic stock-footage territory — a real Sentry trace or error log screenshot would be harder and better
No banned content, no hype, no fake benchmarks — clean

Storyboard / 30.180000000000003s

1hook_text3.66s2terminal5.15s3code_block4.96s4before_after4.59s5terminal3.85s6warning4.17s7cta3.8s

Publish

Disclose AI-generated content (AIGC)on

TikToksandbox · private onlyConfigure platforms ↗

Not posted yet. “Publish” uploads to every configured platform (YouTube live + IG/TikTok when a host + tokens are set), and always writes a paste-ready bundle. Runs in the background — this card updates as platforms complete.

Ready to post

No per-platform captions yet. Click “Generate captions” to research hashtags + write a tailored caption for each platform.

Script

hookYour agent graded its own exam. CI passed.

"Cursor writes the function. Then it writes the tests. Same context, same assumptions."

"The mocks reflect the code it just wrote, not the actual API contract."

"Edge cases it never modeled don't exist in the test file."

"The assertion passes. You commit. CI is green."

"Production is the first thing that runs against reality."

Run log

finished / packaged

packaged

14:47:33Trend scan$0.109

14:47:53Idea selectedarchitecture warning / Your AI agent wrote the tests and the code in the same context, that's why CI is green and prod is broken

14:49:03Hook chosenYour agent graded its own exam. CI passed.

14:49:25Script drafted5 narration beats

14:51:20Storyboard built7 scenes

14:53:27Fact check$0.186

14:54:03QA scored8.0/10 pass

14:58:50PackagingYour Agent Graded Its Own Exam: Why CI Passes but Production Breaks

15:19:54Render assetMP4 ready

14:58:50packaged: "Your Agent Graded Its Own Exam: Why CI Passes but Production Breaks" — total $0.757

14:58:35derivatives: thumbnail + 2 aspect ratios

14:58:21rendered → /Users/lexaplus/development/Socheli/data/renders/claude_20260605144641.mp4

14:57:04beat-sync: 70 beats; sfx: 7 cues

14:57:04b-roll: 7/7 scenes

14:56:58music generated

14:54:33voice: 28.2s via elevenlabs+whisper, 75 words synced

14:54:03QA passed: 8/10

BKBack to queue