one about vector db

know when to build vs rent.

Anshuman Mishra

Nov 10, 2025

“Just use Pinecone/Weaviate”

Until you need:

100M+ vectors indexed
<10ms p95 search latency
$50/month (not $500/month)

Then you build your own vector database.

Here’s what that actually means:

Most engineers think vector DB =

Install FAISS
Wrap with Flask
Add some metadata filtering
Done

Reality hits around 10M vectors.

You’re not building a system to search ONE index for ONE user.

You’re building a system that handles THOUSANDS of concurrent searches, with filters, hybrid search, and real-time updates.

Completely different beast.

What you actually need:

HNSW index builder that doesn’t block writes
Metadata filtering that scales with cardinality
Distributed sharding across index size
Real-time upsert pipeline without rebuild

And that’s just the foundation.

Your <10ms p95 search breaks down as:

Network: 2-3ms (fixed)
Metadata pre-filter: 1-3ms (explodes with complex filters)
ANN search: 3-8ms (depends on ef_search)
Post-filtering: 1-2ms

You have 0-2ms buffer. “Just scale horizontally” doesn’t work.

The first principle of vector search:

Recall@10 ≠ Recall@100
HNSW with ef_search=50 gives 95% recall@10 but 78% recall@100.
Your users want top-100 results with metadata filters.
Now your recall drops to 60%.
This is why “FAISS works fine” fails in production.

Index memory is the silent killer.

100M vectors × 768 dims × 4 bytes = 307GB just for vectors.
HNSW graph adds 2-3x that.
You’re at 900GB memory for ONE index.

And you have 20 different embedding models.

“We need hybrid search with BM25 + vector + metadata filters”

Now your platform needs:

Inverted index alongside HNSW
Score fusion that doesn’t kill latency
Query planning for filter pushdown
Cross-encoder reranking in <5ms

This is where 80% of custom vector DBs fail.

Use Pinecone when you’re under 10M vectors, using standard embeddings, can tolerate 50ms+ latency, and cost per query is 100x raw compute.
Build your own when you have 50M+ vectors, custom embeddings, need sub-15ms p95, or when you’re spending $500+/month.

Let’s do the math:

A hosted vector db at $70/million vectors/month + $0.10 per 1K queries: 100M vectors + 10M queries/month = $7,000 + $1,000 = $8,000

Your self-hosted setup with 2TB RAM machine at $1,000/month: = $1,000 compute

But add $80K engineering, $5K/month maintenance, 8 month break-even.

Production vector DBs have four layers:

Query parsing (filter optimization, query planning, type checking).
Search execution (HNSW navigator, hybrid fusion, distributed scatter-gather).
Index management (real-time updates, compaction, shard rebalancing).
Observability (latency per component, recall metrics, memory pressure).

Most build layer 2 only.

The production checklist:

Use HNSW, not flat index
Implement filter pushdown from day one
Monitor recall AND latency
Auto-shard based on index size, not CPU
Track $/query AND queries/sec
Have hot-reload for index updates
Plan for 50M+ vector growth

That’s it.

Building a vector database is a 8-month project with memory costs everywhere.

But at 100M+ vectors? Pays for itself in 3 months.

Know when to build vs rent.

Full Stack Agents

Discussion about this post

Ready for more?