whichllm auto-detects GPU/CPU/RAM and ranks HuggingFace models by merged benchmark scores, not just VRAM fit, in one CLI command.
Key Takeaways
Scoring merges LiveBench, Artificial Analysis, Aider, Chatbot Arena ELO, and Open LLM Leaderboard with confidence-weighted dampening across five evidence tiers.
Benchmark evidence is graded: direct ID match scores full confidence; self-reported uploader claims discounted to 0.55x; cross-family inheritance blocked when param count diverges more than 2x.
VRAM estimation accounts for weights, GQA KV cache, activation, and framework overhead; MoE speed ranked on active params, quality on total params.
whichllm plan "llama 3 70b" does reverse lookup for hardware planning; --json output enables pipe-to-Ollama workflows.
Recency demotion penalizes stale leaderboard scores along model lineage so older-generation models cannot outrank current ones on outdated evals.
Hacker News Comment Review
Commenters broadly reported stale recommendations: the tool surfaces Qwen2.5 series while users are already running Qwen3.5 and Qwen3.6 models, undermining the core recency-aware claim.
Strong consensus that this should be a static web page rather than an installed CLI; security concerns about running unknown local tooling were raised directly.
A deleted marketing.md file was surfaced in the commit history, raising credibility questions about benchmark sourcing and overall trustworthiness of the project.
Notable Comments
@wren6991: “I also have a script… echo 'Qwen3.6-27B'“ – sharp critique of how stable the top answer actually is.
@zambelli: Suggests adding support for user-supplied benchmarks to fuzzy-match model names, addressing task-specific evaluation gaps.