An update on recent Claude Code quality reports

· ai · Source ↗

TLDR

  • Three separate harness bugs in Claude Code caused real degradation over two months; the underlying models were not at fault.

Key Takeaways

  • Anthropic published a postmortem confirming the complaints about Claude Code quality were legitimate, not user error or model regression.
  • One bug shipped March 26: a change to clear idle-session thinking (meant to run once) instead fired every turn, making Claude appear forgetful and repetitive.
  • The root cause was in the Claude Code harness layer, not the model weights, showing agentic infrastructure can fail independently of model quality.
  • Simon Willison notes he runs 11+ long-lived Claude Code sessions and estimates he prompts more in stale sessions than fresh ones, making this bug class high-impact for power users.

Why It Matters

  • Harness-level bugs in agentic systems are hard to diagnose because symptoms look like model degradation, not infrastructure failure.
  • Builders of agentic systems should treat the orchestration layer as a first-class failure surface, separate from model non-determinism.
  • Long-lived or resumed sessions are a distinct usage pattern that exposes timing and state-management bugs that short sessions never trigger.

Simon Willison, Simon Willison’s Weblog · 2026-04-24 · Read the original