Turbopuffer rebuilds vector search on object storage (S3/GCS) instead of attached SSDs, dropping storage cost from ~$0.33/GB/month on Pinecone to roughly $0.02/GB/month plus pay-per-query reads. The tradeoff is cold-query latency: 300-500ms p50 the first time a namespace is touched, dropping to sub-10ms once warm. Migrate to Turbopuffer if you have many low-traffic namespaces (per-tenant SaaS, codebase indexes), can tolerate a one-time cold read, and need hybrid BM25+vector search without an enterprise contract. Stay on Pinecone if you have a single hot index with hard sub-50ms p99 requirements, lean ops capacity, or already-paid SOC 2 / HIPAA workflows you don't want to redo.
Per-tenant Pinecone indexes, one namespace per customer, most queried less than once an hour, pay full hot-index prices for vectors sitting cold. As the namespace count multiplies, the storage line on the invoice grows faster than traffic. The question every AI-first company with this shape is now asking: should we move to Turbopuffer?
Cursor's published case study describes scaling code retrieval to 100B+ vectors on Turbopuffer; Notion's search infrastructure runs on it; Linear standardized on it for issue search. The trend is real and it's not just hype, object-storage-native vector search genuinely changes the cost curve when your access pattern is sparse. But it's also not a free lunch. There's a cold-read tax, the operational story is different, and for one specific shape of workload, Pinecone is still the right answer.
This post is the side-by-side worth running before committing. We'll walk through why the architecture difference matters, the actual cost math at three scales, the latency tradeoffs you have to engineer around, the migration playbook to use in production, and the workloads where Pinecone still wins. If you're earlier in the vector database decision and weighing more options, our Pinecone vs Weaviate vs Qdrant comparison covers the full landscape, and our pillar guide on RAG systems connects this decision to the rest of the retrieval stack.
Why Cursor, Notion, and Linear Standardized on Turbopuffer
Three of the highest-profile AI-native products of the past two years all chose the same vector database, and they didn't pick it for the same reason any of the marketing pages list. They picked it because the per-namespace cost on hosted indexes hit a wall the moment they started multiplying tenants.
Cursor's published architecture writeup names the constraint precisely: one codebase per namespace, with codebases active only when the developer is editing them. At their scale, 100B+ vectors across millions of repositories, the prior infrastructure required the team to manually balance namespaces across servers as some grew hot and others went idle. Turbopuffer collapsed that to a single primitive: every namespace is just a prefix on object storage. The infra team stopped balancing anything.
Notion's appeal is different but rhymes. Notion's search has to span both keyword precision (page titles, mentions, exact terms) and semantic recall (intent across long-form documents), and they need it to work across millions of workspaces. Running a separate Elasticsearch cluster alongside a vector database doubled their operational footprint. Turbopuffer's native BM25 plus vector search collapsed two systems into one, and the per-workspace namespace model meant they didn't pay for tenants that hadn't logged in this week.
Linear's case is the cleanest illustration: issue search per workspace, where activity is bursty (a developer opens Linear, runs five searches, closes it for two hours). Hosted indexes price these workloads as if they were continuously hot. Turbopuffer prices them as what they actually are, mostly cold storage with occasional warm bursts.
The pattern across all three: many namespaces, sparse access per namespace, hybrid search needs, and a willingness to engineer around cold reads in exchange for an order-of-magnitude cost reduction. If your workload doesn't have those properties, the migration calculus is different.
Architecture: Object Storage Native vs Hosted Index
The architectural gap between Turbopuffer and Pinecone is bigger than the surface API suggests, and it determines almost every downstream tradeoff.
Pinecone: Hosted Index, Always-Hot
Pinecone runs as a managed service where vectors live on attached SSDs behind a serverless front-end. The index is always loaded, always ready to answer queries at sub-50ms p95. Storage, query routing, replication, and scaling are abstracted away. You upsert vectors, you query vectors, the operational surface is essentially a SaaS API. The economics of this model are straightforward: you pay for the SSD-backed capacity that keeps your index hot. Pinecone's serverless billing breaks this into read units ($8.25 per 1M), write units ($2 per 1M), and storage at roughly $0.33/GB/month, with a $50/month minimum on Standard. The price you pay reflects the fact that every vector you've upserted is sitting on indexed SSD ready to serve a query. That model is a feature when your access pattern matches it, high-throughput, single-index, latency-sensitive workloads. It's a tax when it doesn't.
Turbopuffer: Object Storage as Source of Truth
Turbopuffer flips the storage hierarchy. Object storage (S3, GCS, Azure Blob) is the source of truth, not a tier where cold data eventually lands. Each namespace is a prefix on object storage. The index lives there in serialized form. SSD and RAM are caches, populated lazily as namespaces get queried. The implications cascade. Storage cost drops from $0.33/GB/month to roughly $0.02/GB/month, because object storage prices are roughly 15x cheaper than provisioned indexed SSD. Namespace count is effectively unlimited, a million namespaces is just a million S3 prefixes, which is a problem cloud storage solves. And the query path bifurcates: cold queries pay the cost of reading from object storage (343ms p50 on 1M vectors per Turbopuffer's benchmarks), warm queries land at p50 ~8ms once the cache is hot. The tradeoff is operational. You inherit a latency model with two regimes instead of one, and you have to reason about which regime each query lands in. Cursor handled this by warming namespaces on session open. Notion handled it by accepting that the first search after a long idle pays a cold tax most users don't notice. The pattern that works depends on your traffic shape. For deeper context on how the underlying index choice affects retrieval quality regardless of which database you pick, see our analysis of embedding quality vs vector database performance.
Cost Comparison: Three Scales, Real Math
The cost difference between the two systems isn't a flat percentage, it's a function of how many namespaces you have and how often each one is queried. Here's what the math looks like at three realistic shapes we've seen in production.
A few patterns are worth calling out explicitly. The single-index, high-QPS workload at the bottom is the one shape where Pinecone holds its own or wins, because Turbopuffer's per-query read pricing scales linearly with traffic on a single hot namespace, while Pinecone's read units amortize across high-QPS bursts. Multi-tenant workloads with namespace sparsity are where the gap explodes; those are the workloads Cursor, Notion, and Linear were running.
The other line item that surprises teams: storage. At 1B vectors with 1536-dimensional embeddings, you're sitting on roughly 6 TB of vector data. At Pinecone's $0.33/GB/month that's ~$2,000/month in storage alone, before any queries. At Turbopuffer's object-storage rates the same volume is closer to $120/month. Storage dominates the bill at scale, which is why object storage as the primary store is such an aggressive cost lever.
For practical strategies on managing the embedding pipeline that feeds either database, our guide on when to re-embed documents walks through the upstream cost decisions that compound with whichever store you pick.
| Workload Shape | Vectors | Namespaces | Queries/mo | Pinecone Serverless | Turbopuffer | Winner |
|---|---|---|---|---|---|---|
| Single hot index, RAG bot | 5M | 1 | 5M | ~$70/mo | ~$45/mo | Turbopuffer (small margin) |
| Per-tenant SaaS, sparse access | 50M | 10,000 | 2M | ~$420/mo | ~$35/mo | Turbopuffer (12x) |
| Per-codebase, very sparse | 1B | 250,000 | 10M | ~$3,800/mo | ~$280/mo | Turbopuffer (13x) |
| Always-hot recommender, high QPS | 20M | 1 | 200M | ~$1,850/mo | ~$1,400/mo | Turbopuffer (small margin) |
| Always-hot recommender, very high QPS | 20M | 1 | 1B | ~$8,400/mo | ~$8,900/mo | Pinecone (slim) |
Latency: The Cold Read Problem and How to Engineer Around It
The latency story is where most migration plans either succeed or get reverted, so it's worth being precise about what you're signing up for.
The Numbers
Turbopuffer's published benchmarks on a 1M-vector namespace report: Pinecone Serverless on the same data shape lands at p95 ~45ms regardless of access pattern, because the index is always hot. The gap is real, and any migration plan that ignores it gets reverted in production. But the gap also doesn't matter for most workloads, because most workloads don't actually need every query to clear 50ms.
- Cold query (first hit, namespace not in cache): p50 ~343ms, worst case cited around 500-800ms
- Warm query (namespace cached in SSD/RAM): p50 ~8ms, p95 ~25ms
- Hybrid BM25 + vector cold: ~500ms on 1M documents
Patterns That Work
We've used four patterns to engineer around cold reads, in order of how often they apply: 1. Warm on session start. Cursor's pattern: when a user opens a project, prefetch the codebase namespace before the first search query. By the time the user types, the namespace is hot. This works whenever you have a session boundary that signals which namespaces will be queried. 2. Accept the tax for the first query. Notion's pattern: a 400ms first-search latency after a long idle is invisible to users typing into a search bar; subsequent searches at 8ms feel instant. This works for human-driven, bursty UIs. 3. Background-warm on a heartbeat. For workloads where the namespace set is predictable, run a low-priority background job that pings each namespace once an hour to keep the cache warm. Cheap, effective, doesn't show up in user latency. 4. Hybrid routing. Keep a small hot index on Pinecone (or in-memory) for latency-critical lookups (autocomplete, real-time filters) and route everything else to Turbopuffer. This pattern fits any setup where a front-end search bar has to clear 100ms but the long-form RAG retrieval can afford 500ms. The pattern that doesn't work: trying to use Turbopuffer for synchronous, latency-critical, single-shot queries against cold tenants. If your workload looks like that, don't migrate. For a deeper look at where retrieval latency actually dominates end-to-end, it's almost never the database, see our breakdown in vector search returns nothing: troubleshooting guide, which covers the diagnostic flow we use to localize latency in a full RAG stack.
Migration Playbook: Schema, Query Patterns, and the Gotchas
This migration has a repeatable shape, and the runbook below is the version that holds up in production.
# Pinecone filter
filter = {
"category": {"$in": ["docs", "tutorials"]},
"published_at": {"$gte": "2026-01-01"}
}
# Turbopuffer equivalent
filters = [
"And",
[["category", "In", ["docs", "tutorials"]],
["published_at", "Gte", "2026-01-01"]]
]Phase 1: Schema Mapping
Pinecone's data model maps almost cleanly to Turbopuffer: The one structural decision worth making upfront: Pinecone's namespaces are sometimes used as a logical grouping inside a single index, while Turbopuffer treats namespaces as the primary unit of isolation and pricing. If you're using Pinecone namespaces as a tenant boundary, the mapping is direct. If you're using a single namespace with metadata filtering for tenant isolation, you should split into per-tenant namespaces during the migration, both for performance and for cost reasons, since Turbopuffer's pricing rewards the split.
- Pinecone index → Turbopuffer cluster
- Pinecone namespace → Turbopuffer namespace (literally a prefix on object storage)
- Pinecone metadata → Turbopuffer attributes (similar key-value structure)
- Pinecone filter expression → Turbopuffer filter (different syntax, similar semantics)
Phase 2: Query Layer Rewrite
This is where most teams underestimate the work. Pinecone's filter dialect uses MongoDB-style operators. Turbopuffer's filter syntax is similar but not identical, and several edge cases differ: Wrap the filter construction behind an interface so you can swap providers without touching every call site. We typically build a thin adapter layer during the dual-write window.
Phase 3: Dual-Write Window
For 24-48 hours, write every upsert to both Pinecone and Turbopuffer. Read from Pinecone. This window does two jobs: it builds the Turbopuffer corpus without a backfill spike, and it lets you compare query results between the two systems on production traffic. Watch for one specific failure mode: recall divergence. If your two systems return different top-K results for the same query, it's almost always because the index parameters differ, not because the data differs. Turbopuffer uses a centroid-optimized index variant; Pinecone uses HNSW. The difference is usually within noise, but for high-stakes recall (legal, medical) you should test it explicitly before cutover.
Phase 4: Read Cutover, Namespace by Namespace
Don't flip all reads at once. Cut over one namespace (or one tenant cohort) at a time, monitor for a full traffic cycle, then proceed. Keep Pinecone live as a fallback for at least a week after the last namespace migrates. The metric we watch most closely during cutover is the cold-read p99 latency by namespace, segmented by hour-of-day. A namespace that looked fine in dev because dev traffic kept it warm will surface cold reads at 3am in production.
Phase 5: Backfill and Decommission
Once reads are 100% on Turbopuffer and the system has been stable for a week, backfill any vectors that were written to Pinecone before the dual-write window started but aren't in Turbopuffer. Then turn off the Pinecone writer and deprovision the index. For teams running the broader RAG pipeline, our walkthrough on updating RAG knowledge without rebuilding covers the upsert-pattern decisions that compound with this migration.
When Pinecone Still Wins
Despite the cost gap, there are three workload shapes where Pinecone remains the right answer after running the analysis.
Single hot index with hard sub-50ms p99 SLO. A high-QPS recommender, a real-time semantic search bar, or any synchronous user-facing path where a 400ms cold read would be visible. Turbopuffer's warm latency is fine; it's the cold path that doesn't fit. If you can't engineer the cold reads out, don't migrate.
Compliance burden already paid. If your team has already collected SOC 2 evidence on Pinecone, gone through a HIPAA BAA, or completed a customer-required vendor security review, the cost of redoing that on a new vendor is real. Pinecone's compliance posture is mature; Turbopuffer's is improving but newer. In regulated environments where the audit cycle is the long pole, the price gap doesn't always justify the disruption.
Lean ops capacity, simple workload. Pinecone's hot-by-default model is genuinely simpler to reason about. There's one latency regime, no cache-warming pattern, no cold-read SLO to track. For a small team with a single index and no scaling pressure, the operational simplicity is worth the price gap. Migrate when scale forces you to, not before.
For broader context on the build-vs-buy decision that often surfaces alongside these migrations, our writeup on when to build vs buy AI infrastructure covers the framework worth applying.
A Defensible Approach to Vector Database Selection
There's no default vector database. There's a default question: what's the access pattern?
A single high-QPS index with tight latency requirements usually points to Pinecone, or Qdrant for the cost-performance ratio. Multi-tenant sparseness, codebase-style indexing, or hybrid search needs across millions of workspaces increasingly point to Turbopuffer. The decision isn't ideological; it's a function of where the cost curve sits for the specific workload.
What's worth abandoning is recommending a vector database based on benchmark blog posts. The numbers in those posts almost never reflect the access pattern an actual workload has. A two-week side-by-side on production-shape traffic before any migration decision is the only signal that holds up.
If you're staring at a vector database bill that's growing faster than your traffic, or you're about to commit to a multi-year contract on a hosted index, that's the conversation to have before signing. Cost curves on object storage have rewritten the playbook for sparse-access workloads, and the savings at scale are real, but so are the latency tradeoffs, and the wrong migration is more expensive than the right contract.
For more on the full retrieval stack that sits on top of either database, our pillar guide on RAG systems covers the upstream decisions, chunking, embedding choice, reranking, that compound with whichever store you pick.
Frequently Asked Questions
Quick answers to common questions about this topic
All three companies hit the same wall: per-tenant or per-codebase namespaces multiplied faster than any hosted-index pricing model could absorb. Cursor's published case study describes scaling code retrieval to 100B+ vectors with one namespace per repository, where the prior architecture required manually balancing indexes across servers. Turbopuffer's object-storage-native design removes the namespace cap entirely and prices storage at roughly $0.02/GB/month, versus Pinecone's $0.33/GB/month. For workloads where most namespaces are queried infrequently, the math swings 10-15x in Turbopuffer's favor.


