Did Claude really get dumber again?

Name: Did Claude really get dumber again?
Uploaded: 2026-04-20T12:00:00.000000Z
Description: Theo (t3.gg) argues Claude’s regressions are measurable and stem from bad Claude Code engineering, a botched tokenizer change, and forced 1M context routing — not just user perception. Opus performs 1…

Apr 20, 2026 · video · Source ↗

Summary based on the YouTube transcript and episode description.

Theo (t3.gg) argues Claude’s regressions are measurable and stem from bad Claude Code engineering, a botched tokenizer change, and forced 1M context routing — not just user perception.

Opus performs 15% worse inside Claude Code than inside Cursor on the same benchmark — harness engineering is the primary culprit, not the model weights.
Claude Code scores 58% on Terminal Bench; competing harnesses Forge Code and Cappy score 75–82% using the same Anthropic models.
Opus 4.7’s new tokenizer inflates token counts 1.35–1.47x on real codebases, bloating context and accelerating context rot.
Anthropic’s own September postmortem confirmed that routing requests to the 1M-context model version degrades quality; that version is now the forced default for all Claude Code users.
AMD’s AI director analyzed 6,800 Claude Code sessions: thinking depth fell 73%, read-to-edit ratio collapsed from 6.6:1 to 2:1, and API requests grew 80x with the same number of human prompts.
Thinking redaction went from 0% to 100% between Jan 30 and Mar 12, 2025; quality regression reports peaked precisely when redaction crossed 50% on Mar 8.
OpenAI/Codex has no comparable sustained regression pattern per broad community polling and Tibo’s (Cursor) public statement that they do not adjust thinking budgets post-release.

2026-04-20 · Watch on YouTube