A viral GitHub repository called ‘Caveman’ for Claude Code operates on a simple, humorous premise: forcing Large Language Models (LLMs) to speak as concisely as a Neanderthal. While initially seeming like a meme, this approach highlights a critical theme in AI optimization—reducing verbosity not only saves tokens but can dramatically improve a model’s technical performance and accuracy.
The Reality of Token Savings
The Caveman repository claims massive token reductions, such as cutting 75% of output tokens and 45% of input tokens. However, the video clarifies that these numbers only apply to specific portions of a session, primarily the conversational prose. Code generation and tool calls remain unaffected. In reality, users can expect an overall token savings of around 4% to 5% per session. While not as exaggerated as the repo’s claims, this is still a meaningful cost reduction for heavy developers.
The Science: How Brevity Boosts Accuracy
The core argument for brevity is backed by a recent research paper titled Brevity Constraints, Reverse Performance Hierarchies, and Language Models. The study evaluated 31 models across 1,500 problems and found surprising results:
- The ‘Overthinking’ Problem: Larger models often underperform smaller ones because of ‘scale-dependent verbosity.’ By trying to be too thorough, they spin in circles, obscure correct reasoning, and accumulate errors.
- Training Flaws: This behavior likely stems from Reinforcement Learning with Human Feedback (RLHF), where human graders traditionally reward longer, more elaborate answers, inadvertently training the AI to over-explain rather than just be correct.
- Reversing the Hierarchy: By forcing large models to produce brief, ‘caveman-like’ responses, researchers improved accuracy by 26 percentage points, allowing large models to successfully outperform smaller ones again.
Conclusions and Practical Takeaways
The overarching conclusion is that excessive verbosity actively harms LLM reasoning. Implementing the Caveman skill—or simply updating your system prompts (e.g., in a claude.md file) with instructions like ‘be concise, no filler, straight to the point’—is a highly recommended practice. It requires zero underlying changes to how the model codes, yet it prevents AI ‘overthinking,’ delivers better technical answers, and slightly reduces API costs.
Mentoring question
How might you adjust your current LLM system prompts to prioritize brevity and reduce ‘overthinking’ in your AI-generated outputs?
Source: https://youtube.com/watch?v=4FO1Liu-ttk&is=yDChiwFxicaB78uV