Semble is a CPU-only code search library for AI agents combining Model2Vec embeddings and BM25 with RRF fusion, indexing repos in ~250ms and querying in ~1.5ms.
Key Takeaways
Achieves NDCG@10 of 0.854 on ~1,250 queries across 63 repos in 19 languages, 99% of CodeRankEmbed Hybrid quality at 218x faster indexing.
Returns only matched chunks, using ~98% fewer tokens than grep+read; reaches 94% recall within 2k tokens vs. 100k needed by grep+read for 85% recall.
Runs entirely on CPU with no API keys or GPU; indexes average repos in ~250ms, queries in ~1.5ms using static Model2Vec potion-code-16M embeddings.
Deploys as an MCP server (Claude Code, Cursor, Codex, OpenCode) or via bash/AGENTS.md for sub-agents that cannot call MCP tools directly.
Ranking layer adds adaptive lexical/semantic weighting, definition boosts, identifier stemming, file coherence bonuses, and noise penalties for test/legacy files.
Hacker News Comment Review
Core skepticism: the 98% token reduction claim compares against reading entire matched files, which is a real agent behavior but not how an experienced developer would use grep, making the baseline somewhat inflated.
Commenters with direct eval experience report that models heavily RL-trained on grep often distrust novel tool output, retrying or re-reading files and erasing token savings; one workaround is a global CLAUDE.md instruction to prefer LSP or semantic tools over grep.
Debate on whether semantic search tools make agents “dumber” by replacing reasoning with retrieval is split: some see regression, others note Claude Code defaults to slow line-range sed reads anyway, making pre-indexed semantic search a strict improvement.
Notable Comments
@jerezzprime: No agent benchmarks (e.g. CC or Copilot CLI with grep replaced) are provided; RL-trained tool preference may silently negate all savings.
@aadishv: Live test on the browser-use/browsercode repo showed semble returning useful results in practice.