I benchmarked Claude Code's caveman plugin against "be brief."

Apr 30, 2026 · ai · Source ↗

TLDR

Solo dev ran 24 prompts across 5 arms on claude-opus-4-7; “be brief.” matched Caveman on token count and quality, cutting 34% tokens vs baseline.

Quality was flat across all arms: baseline 0.985, brief 0.985, lite 0.976, full 0.975, ultra 0.970 – all within 1.5%, zero must_avoid triggers in 120 responses.
“Be brief.” averaged 419 tokens vs baseline 636; Caveman lite/full landed near 419; ultra averaged 449 due to Auto-Clarity safety escapes inflating setup and security categories.
Caveman’s real differentiator is structural consistency and session persistence via SessionStart/UserPromptSubmit hooks – not compression.
Auto-Clarity intentionally drops compression for destructive ops and multi-step sequences; two words carry no equivalent safety escape logic.
Ultra triggered unexpected tool-use behavior on a Dockerfile prompt, adding ~1300 tokens to its setup category mean – a side-effect of terse compression priming.

Broad consensus that Caveman was always more novelty than utility; commenters note token savings are negligible at scale in long coding sessions where hundreds of thousands of tokens are common.
Skepticism that prompt-engineering hacks beat model defaults is strong; several commenters frame plugins like Caveman as “snake oil” that spread before anyone measures against the boring baseline.
Practical spin-off: commenters are now considering adding “be brief” directly to CLAUDE.md or AGENTS.md as a zero-cost default.

@avaer: frames Caveman-style hacks as easy-to-install tools “you never check” – popularity outpaces any verification as the tech moves fast.
@encody: flags the post’s prose itself as exhibiting AI-output patterns; “I know this smell.”