Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep

May 17, 2026 · ai ai-agents coding · Source ↗

TLDR

Semble is a CPU-only code search library for agents combining Model2Vec embeddings and BM25 with RRF fusion, indexing repos in ~250ms and querying in ~1.5ms.

Achieves NDCG@10 of 0.854, 99% of CodeRankEmbed Hybrid quality, while indexing 218x faster and querying 11x faster.
Returns only relevant chunks, using 98% fewer tokens than grep+read; reaches 94% recall at 2k tokens vs 100k context needed by grep+read.
Runs entirely on CPU with no API keys, GPU, or external services; installable via pip or uv, works as MCP server or bash tool via AGENTS.md.
Ranking uses adaptive lexical/semantic weighting, definition boosts, identifier stemming, file coherence scoring, and noise penalties for test/legacy files.
Supports Claude Code, Cursor, Codex, OpenCode via MCP or bash integration; semble savings tracks token savings over time.

The core open question is whether agents actually trust Semble’s results in practice: models heavily RL’d on grep may ignore non-grep outputs and retry, erasing token savings entirely.
Benchmarks measure only retrieval accuracy (NDCG@10), not end-to-end agent task performance; authors acknowledge this gap and say agent benchmarks are on the roadmap.
The “98% fewer tokens than grep” framing drew skepticism since grep alone returns no context tokens; the valid baseline is grep+readfile, which the authors confirmed is the intended comparison.

@jerezzprime: raises evidence from RTK/LSP experiments that agents retry or re-read when they distrust non-grep tool results, nullifying token savings.