Did Claude really get dumber again?

· video · Source ↗

Summary based on the YouTube transcript and episode description.

Theo (t3.gg) argues Claude’s regressions are measurable and stem from bad Claude Code engineering, a botched tokenizer change, and forced 1M context routing — not just user perception.

  • Opus performs 15% worse inside Claude Code than inside Cursor on the same benchmark — harness engineering is the primary culprit, not the model weights.
  • Claude Code scores 58% on Terminal Bench; competing harnesses Forge Code and Cappy score 75–82% using the same Anthropic models.
  • Opus 4.7’s new tokenizer inflates token counts 1.35–1.47x on real codebases, bloating context and accelerating context rot.
  • Anthropic’s own September postmortem confirmed that routing requests to the 1M-context model version degrades quality; that version is now the forced default for all Claude Code users.
  • AMD’s AI director analyzed 6,800 Claude Code sessions: thinking depth fell 73%, read-to-edit ratio collapsed from 6.6:1 to 2:1, and API requests grew 80x with the same number of human prompts.
  • Thinking redaction went from 0% to 100% between Jan 30 and Mar 12, 2025; quality regression reports peaked precisely when redaction crossed 50% on Mar 8.
  • OpenAI/Codex has no comparable sustained regression pattern per broad community polling and Tibo’s (Cursor) public statement that they do not adjust thinking budgets post-release.

2026-04-20 · Watch on YouTube