DeepSeek-V4-Flash is the first local model strong enough to make activation steering practically worth attempting for working engineers.
Key Takeaways
Steering works by extracting a concept as a difference in activation matrices across prompt pairs, then adding that vector back during inference.
Anthropic’s sparse autoencoder approach does the same thing more rigorously but at much higher cost in compute and expertise.
The author is skeptical steering beats prompting for most use cases; ambitious goals like “intelligence” or “knows my codebase” likely require fine-tuning instead.
DwarfStar 4 (antirez’s standalone llama.cpp-inspired runtime) ships steering as a first-class feature; runs on 96-128GB MacBooks.
Steering’s practical niche may be concepts that can’t be prompted for and that are compact enough to not require a full model retrain.
Hacker News Comment Review
The article omits the most-discussed real-world use: abliteration, where refusal behavior is localized to a single vector and can be nulled out. Antirez confirmed DwarfStar 4 achieves this with steering, not just the toy verbosity demo.
Commenters clarified DwarfStar 4 is its own project, not a stripped-down llama.cpp fork, though it referenced llama.cpp heavily for implementation details.
The technique overlaps directly with “control vectors” / representation engineering (vgel.me); the DwarfStar direction-subtraction code matches that prior art exactly.
Notable Comments
@micahwhite: shipped a live app using steering to shift a model’s political stance, citing the technique as having broad practical potential.