An update on recent Claude Code quality reports
TLDR
- Three separate harness bugs in Claude Code caused real degradation over two months; the underlying models were not at fault.
Key Takeaways
- Anthropic published a postmortem confirming the complaints about Claude Code quality were legitimate, not user error or model regression.
- One bug shipped March 26: a change to clear idle-session thinking (meant to run once) instead fired every turn, making Claude appear forgetful and repetitive.
- The root cause was in the Claude Code harness layer, not the model weights, showing agentic infrastructure can fail independently of model quality.
- Simon Willison notes he runs 11+ long-lived Claude Code sessions and estimates he prompts more in stale sessions than fresh ones, making this bug class high-impact for power users.
Why It Matters
- Harness-level bugs in agentic systems are hard to diagnose because symptoms look like model degradation, not infrastructure failure.
- Builders of agentic systems should treat the orchestration layer as a first-class failure surface, separate from model non-determinism.
- Long-lived or resumed sessions are a distinct usage pattern that exposes timing and state-management bugs that short sessions never trigger.
Simon Willison, Simon Willison’s Weblog · 2026-04-24 · Read the original