An update on recent Claude Code quality reports

Apr 24, 2026 · ai · Source ↗

TLDR

Three separate harness bugs in Claude Code caused real degradation over two months; the underlying models were not at fault.

Anthropic published a postmortem confirming the complaints about Claude Code quality were legitimate, not user error or model regression.
One bug shipped March 26: a change to clear idle-session thinking (meant to run once) instead fired every turn, making Claude appear forgetful and repetitive.
The root cause was in the Claude Code harness layer, not the model weights, showing agentic infrastructure can fail independently of model quality.
Simon Willison notes he runs 11+ long-lived Claude Code sessions and estimates he prompts more in stale sessions than fresh ones, making this bug class high-impact for power users.

Harness-level bugs in agentic systems are hard to diagnose because symptoms look like model degradation, not infrastructure failure.
Builders of agentic systems should treat the orchestration layer as a first-class failure surface, separate from model non-determinism.
Long-lived or resumed sessions are a distinct usage pattern that exposes timing and state-management bugs that short sessions never trigger.

Simon Willison, Simon Willison’s Weblog · 2026-04-24 · Read the original