OpenData Vector: MIT-Licensed Vector Search on Object Storage

· systems databases · Source ↗

TLDR

  • MIT-licensed stateless vector search built on SlateDB, running entirely on object storage, serving 100M vectors for ~$350/mo.

Key Takeaways

  • Uses IVF (SPFresh-based) indexing instead of HNSW to enable batched S3 GETs, avoiding sequential graph traversal with up to 100ms first-byte latency.
  • All state lives in SlateDB on S3; nodes share everything and never coordinate directly, enabling single-pod production deployments with second-level failover.
  • Warm query latency is low single-digit ms for smaller datasets; cold P90 stays under 1 second across all tested ANN datasets at 90%+ recall.
  • Write acknowledgement latency is up to 1 second due to batching; OpenData Buffer can reduce this to ~100ms but without read-your-writes.
  • Claims to be the only OSS Gen-3 stateless online vector database (Turbopuffer is a proprietary comparable); roadmap includes quantization, smaller dtypes, and full-text search.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN