Show HN: Find the best local LLM for your hardware, ranked by benchmarks

· ai coding hardware · Source ↗

TLDR

  • whichllm auto-detects GPU/CPU/RAM and ranks HuggingFace models by merged benchmark scores, not just VRAM fit, in one CLI command.

Key Takeaways

  • Scoring merges LiveBench, Artificial Analysis, Aider, Chatbot Arena ELO, and Open LLM Leaderboard with confidence-weighted dampening across five evidence tiers.
  • Benchmark evidence is graded: direct ID match scores full confidence; self-reported uploader claims discounted to 0.55x; cross-family inheritance blocked when param count diverges more than 2x.
  • VRAM estimation accounts for weights, GQA KV cache, activation, and framework overhead; MoE speed ranked on active params, quality on total params.
  • whichllm plan "llama 3 70b" does reverse lookup for hardware planning; --json output enables pipe-to-Ollama workflows.
  • Recency demotion penalizes stale leaderboard scores along model lineage so older-generation models cannot outrank current ones on outdated evals.

Hacker News Comment Review

  • Commenters broadly reported stale recommendations: the tool surfaces Qwen2.5 series while users are already running Qwen3.5 and Qwen3.6 models, undermining the core recency-aware claim.
  • Strong consensus that this should be a static web page rather than an installed CLI; security concerns about running unknown local tooling were raised directly.
  • A deleted marketing.md file was surfaced in the commit history, raising credibility questions about benchmark sourcing and overall trustworthiness of the project.

Notable Comments

  • @wren6991: “I also have a script… echo 'Qwen3.6-27B'“ – sharp critique of how stable the top answer actually is.
  • @zambelli: Suggests adding support for user-supplied benchmarks to fuzzy-match model names, addressing task-specific evaluation gaps.

Original | Discuss on HN