New post
// preview room / claude code lab

Your Agent Graded Its Own Exam: Why CI Passes but Production Breaks

packaged
0:00/0:00
DLDownload thumbnail
Cost ledger
trends$0.109
idea$0.041
hook$0.131
script$0.049
storyboard$0.160
factcheck$0.186
qa$0.061
package$0.019
total$0.757

QA Council

8.0/10
specificity
8
utility
8
technical validity
9
visual clarity
7
brand fit
9
anti slop
8
platform safety
9
  • Hook 'graded its own exam' is original and sticky — earns attention without hype
  • Core mechanism is precisely named: same context window → mocks reflect the code, not the contract. This is the rare insight that feels earned, not summarized from a blog post
  • TypeScript jest.mock block is syntactically valid and the focusLines (3,4,10) correctly target the fabricated return value — technical credibility holds
  • s4 before/after is the visual anchor and it works: 'fetchUser(validId)' vs 'null, expired token, 404, rate limit' is a concrete contrast, not a vague claim
  • CTA is actionable but slightly underspecified — 'write tests before the agent touches implementation' is the right directive but doesn't say what that looks like in Cursor/Claude Code (failing test file first, then agent fills impl). One more sentence would close the loop for the target audience
  • s6 warning scene is strong on copy but broll 'server rack red alert blinking' is generic stock-footage territory — a real Sentry trace or error log screenshot would be harder and better
  • No banned content, no hype, no fake benchmarks — clean

Storyboard / 30.180000000000003s

1hook_text3.66s2terminal5.15s3code_block4.96s4before_after4.59s5terminal3.85s6warning4.17s7cta3.8s

Publish

TikToksandbox · private onlyConfigure platforms ↗
Not posted yet. “Publish” uploads to every configured platform (YouTube live + IG/TikTok when a host + tokens are set), and always writes a paste-ready bundle. Runs in the background — this card updates as platforms complete.

Ready to post

No per-platform captions yet. Click “Generate captions” to research hashtags + write a tailored caption for each platform.

Script

hookYour agent graded its own exam. CI passed.
"Cursor writes the function. Then it writes the tests. Same context, same assumptions."
"The mocks reflect the code it just wrote, not the actual API contract."
"Edge cases it never modeled don't exist in the test file."
"The assertion passes. You commit. CI is green."
"Production is the first thing that runs against reality."

Run log

finished / packaged
packaged
14:47:33Trend scan$0.109
14:47:53Idea selectedarchitecture warning / Your AI agent wrote the tests and the code in the same context, that's why CI is green and prod is broken
14:49:03Hook chosenYour agent graded its own exam. CI passed.
14:49:25Script drafted5 narration beats
14:51:20Storyboard built7 scenes
14:53:27Fact check$0.186
14:54:03QA scored8.0/10 pass
14:58:50PackagingYour Agent Graded Its Own Exam: Why CI Passes but Production Breaks
15:19:54Render assetMP4 ready
14:58:50packaged: "Your Agent Graded Its Own Exam: Why CI Passes but Production Breaks" — total $0.757
14:58:35derivatives: thumbnail + 2 aspect ratios
14:58:21rendered → /Users/lexaplus/development/Socheli/data/renders/claude_20260605144641.mp4
14:57:04beat-sync: 70 beats; sfx: 7 cues
14:57:04b-roll: 7/7 scenes
14:56:58music generated
14:54:33voice: 28.2s via elevenlabs+whisper, 75 words synced
14:54:03QA passed: 8/10
BKBack to queue