Solo dev ran 24 prompts across 5 arms (baseline, “be brief.”, caveman lite/full/ultra) and found two words matched the plugin on tokens and quality.
Key Takeaways
“Be brief.” cut tokens 34% vs baseline (636 to 419 mean); caveman lite and full landed close, ultra averaged 449 – longer than brief.
All 5 arms scored within 1.5% of each other on correctness; 100% key-point coverage, zero must_avoid triggers across 120 scored responses.
Caveman’s Auto-Clarity escape deliberately suspends compression for destructive ops and multi-step sequences – that design choice explains token variance in setup and security categories, not a bug.
Ultra triggered tool-use behavior on a Dockerfile prompt (Write tool call, blocked, dumped file inline), adding ~1300 tokens to its setup category mean – a compression-style side effect.
Caveman’s real differentiator is consistent output structure via SessionStart/UserPromptSubmit hooks and a slash-command intensity dial – neither is replicable with two words in CLAUDE.md.
Hacker News Comment Review
Author confirmed the benchmark design: 24 prompts, 5 arms, rubric-judged by claude-sonnet-4-6 against required facts, required terms, and must_avoid traps – methodology is open source.
One commenter dismissed caveman outright on UX grounds, arguing compressed robot-speak degrades the interaction regardless of token savings.
Notable Comments
@lofaszvanitt: “Caveman is useless for me” – frames the tradeoff as comfort vs. efficiency, not tokens vs. quality.