An update on recent Claude Code quality reports

· ai design coding · Source ↗

TLDR

  • Anthropic explains three separate Claude Code bugs between March 4 and April 20 that degraded quality: reduced reasoning effort, dropped thinking history, and a verbosity system prompt.

Key Takeaways

  • March 4: reasoning effort silently dropped from high to medium for Sonnet 4.6 and Opus 4.6 to cut latency; reverted April 7 after user complaints.
  • March 26 caching bug cleared thinking history every turn instead of once, causing forgetfulness, repetition, and faster usage limit drain.
  • April 16 system prompt capped tool-call output at 25 words; ablations later found a 3% coding quality drop across Opus 4.6 and 4.7.
  • Three changes hit different traffic slices on different dates, making aggregate degradation look like normal variation and delaying diagnosis by weeks.
  • Going forward: xhigh effort is now the default for Opus 4.7, per-model evals run before every system prompt change, and usage limits are reset.

Hacker News Comment Review

  • Consensus: session state changes, context management behavior, and effort-level defaults were all modified without documentation; users discovered problems through degraded output, not changelogs.
  • Commenters pushed back on process: experimental gates, soak periods, or gradual rollouts for system prompt changes could have caught each regression earlier.
  • The 25-words-between-tool-calls instruction became a meme; one commenter called it “Claude caveman,” capturing frustration that a single prompt line tanked coding quality across all affected models.

Notable Comments

  • @everdrive: Reports Claude narrating its own internal prompt-injection defenses aloud in responses, a distinct issue not covered by any of the three disclosed bugs.
  • @natdempk: Session-resumption thinking truncation was undisclosed; “I don’t think that’s even documented behavior anywhere.”

Original | Discuss on HN