// preview room / claude code lab
Your Agent Graded Its Own Exam: Why CI Passes but Production Breaks
packagedCost ledger
trends$0.109
idea$0.041
hook$0.131
script$0.049
storyboard$0.160
factcheck$0.186
qa$0.061
package$0.019
total$0.757
QA Council
8.0/10specificity
8
utility
8
technical validity
9
visual clarity
7
brand fit
9
anti slop
8
platform safety
9
- Hook 'graded its own exam' is original and sticky — earns attention without hype
- Core mechanism is precisely named: same context window → mocks reflect the code, not the contract. This is the rare insight that feels earned, not summarized from a blog post
- TypeScript jest.mock block is syntactically valid and the focusLines (3,4,10) correctly target the fabricated return value — technical credibility holds
- s4 before/after is the visual anchor and it works: 'fetchUser(validId)' vs 'null, expired token, 404, rate limit' is a concrete contrast, not a vague claim
- CTA is actionable but slightly underspecified — 'write tests before the agent touches implementation' is the right directive but doesn't say what that looks like in Cursor/Claude Code (failing test file first, then agent fills impl). One more sentence would close the loop for the target audience
- s6 warning scene is strong on copy but broll 'server rack red alert blinking' is generic stock-footage territory — a real Sentry trace or error log screenshot would be harder and better
- No banned content, no hype, no fake benchmarks — clean
Storyboard / 30.180000000000003s
1hook_text3.66s2terminal5.15s3code_block4.96s4before_after4.59s5terminal3.85s6warning4.17s7cta3.8s
Publish
Not posted yet. “Publish” uploads to every configured platform (YouTube live + IG/TikTok when a host + tokens are set), and always writes a paste-ready bundle. Runs in the background — this card updates as platforms complete.
Ready to post
No per-platform captions yet. Click “Generate captions” to research hashtags + write a tailored caption for each platform.
Script
hookYour agent graded its own exam. CI passed.
"Cursor writes the function. Then it writes the tests. Same context, same assumptions."
"The mocks reflect the code it just wrote, not the actual API contract."
"Edge cases it never modeled don't exist in the test file."
"The assertion passes. You commit. CI is green."
"Production is the first thing that runs against reality."
Run log
finished / packaged
14:47:33Trend scan$0.109
14:47:53Idea selectedarchitecture warning / Your AI agent wrote the tests and the code in the same context, that's why CI is green and prod is broken
14:49:03Hook chosenYour agent graded its own exam. CI passed.
14:49:25Script drafted5 narration beats
14:51:20Storyboard built7 scenes
14:53:27Fact check$0.186
14:54:03QA scored8.0/10 pass
14:58:50PackagingYour Agent Graded Its Own Exam: Why CI Passes but Production Breaks
15:19:54Render assetMP4 ready
14:58:50packaged: "Your Agent Graded Its Own Exam: Why CI Passes but Production Breaks" — total $0.757
14:58:35derivatives: thumbnail + 2 aspect ratios
14:58:21rendered → /Users/lexaplus/development/Socheli/data/renders/claude_20260605144641.mp4
14:57:04beat-sync: 70 beats; sfx: 7 cues
14:57:04b-roll: 7/7 scenes
14:56:58music generated
14:54:33voice: 28.2s via elevenlabs+whisper, 75 words synced
14:54:03QA passed: 8/10