This model is kind of a disaster.

Name: This model is kind of a disaster.
Uploaded: 2026-04-17T12:00:00.000000Z
Description: Theo (t3.gg) tests Opus 4.7 for a full day and concludes its regressions come from Claude Code’s degraded harness, not the model itself. Opus 4.7 scores worse than Opus 4.6 on Agentic Search bench and…

Apr 17, 2026 · video · Source ↗

Summary based on the YouTube transcript and episode description.

Theo (t3.gg) tests Opus 4.7 for a full day and concludes its regressions come from Claude Code’s degraded harness, not the model itself.

Opus 4.7 scores worse than Opus 4.6 on Agentic Search bench and slightly worse on cybersecurity vulnerability reproduction.
Cyber safeguards are so aggressive the model hard-locked a Defcon cryptography puzzle chat, refusing to continue unless downgraded to Sonnet 4.
A malware-prevention system prompt leaked into normal Claude Code Desktop sessions, flagging Theo’s own personal site as malware.
Vision input limit raised to 2576px long edge (~4MP), roughly 3x previous Claude models.
Opus 4.7 never searched for latest package versions, repeatedly planning Next.js 15 upgrades despite Next.js 16 being available; GPT-5.4 fetched live docs and correctly targeted Next.js 16.
Theo’s core thesis: Anthropic engineers use a different internal stack than public Claude Code, so they ship models that perform well internally but land in a broken harness externally.
New Claude Code features shipped: X-high effort level, /ultrareview command, and auto-mode permission classifier — but auto mode broke the existing bypass-permissions flag during Theo’s testing.
Pricing unchanged from Opus 4.6: $5 per million input tokens, $25 per million output tokens.

2026-04-17 · Watch on YouTube