Turbopuffer rebuilds vector search on object storage (S3/GCS) instead of attached SSDs, dropping storage cost from ~$0.33/GB/month on Pinecone to roughly $0.02/GB/month plus pay-per-query reads. The tradeoff is cold-query latency: 300-500ms p50 the first time a namespace is touched, dropping to sub-10ms once warm. Migrate to Turbopuffer if you have many low-traffic namespaces (per-tenant SaaS, codebase indexes), can tolerate a one-time cold read, and need hybrid BM25+vector search without an enterprise contract. Stay on Pinecone if you have a single hot index with hard sub-50ms p99 requirements, lean ops capacity, or already-paid SOC 2 / HIPAA workflows you don't want to redo.
A SaaS client called us in March with the kind of bill nobody wants to send to the board: their Pinecone spend had crossed $4,200 a month, growing 18% per quarter, and the line on the invoice that mattered most was storage. They were running a per-tenant index strategy—one namespace per customer—and most of those namespaces were getting queried less than once an hour. They were paying full hot-index prices for vectors that lived rent-free in cold tenants. The question they asked was the same one every AI-first company is asking right now: should we move to Turbopuffer?
We've now done five of these migrations and watched two more get aborted halfway through. Cursor's published case study describes scaling code retrieval to 100B+ vectors on Turbopuffer; Notion's search infrastructure runs on it; Linear standardized on it for issue search. The trend is real and it's not just hype—object-storage-native vector search genuinely changes the cost curve when your access pattern is sparse. But it's also not a free lunch. There's a cold-read tax, the operational story is different, and for one specific shape of workload, Pinecone is still the right answer.
This post is the side-by-side we hand clients before they commit. We'll walk through why the architecture difference matters, the actual cost math at three scales, the latency tradeoffs you have to engineer around, the migration playbook we use in production, and the workloads where Pinecone still wins. If you're earlier in the vector database decision and weighing more options, our Pinecone vs Weaviate vs Qdrant comparison covers the full landscape, and our pillar guide on RAG systems connects this decision to the rest of the retrieval stack.
Why Cursor, Notion, and Linear Standardized on Turbopuffer
Three of the highest-profile AI-native products of the past two years all chose the same vector database, and they didn't pick it for the same reason any of the marketing pages list. They picked it because the per-namespace cost on hosted indexes hit a wall the moment they started multiplying tenants.
Cursor's published architecture writeup names the constraint precisely: one codebase per namespace, with codebases active only when the developer is editing them. At their scale—100B+ vectors across millions of repositories—the prior infrastructure required the team to manually balance namespaces across servers as some grew hot and others went idle. Turbopuffer collapsed that to a single primitive: every namespace is just a prefix on object storage. The infra team stopped balancing anything.
Notion's appeal is different but rhymes. Notion's search has to span both keyword precision (page titles, mentions, exact terms) and semantic recall (intent across long-form documents), and they need it to work across millions of workspaces. Running a separate Elasticsearch cluster alongside a vector database doubled their operational footprint. Turbopuffer's native BM25 plus vector search collapsed two systems into one, and the per-workspace namespace model meant they didn't pay for tenants that hadn't logged in this week.
Linear's case is the cleanest illustration: issue search per workspace, where activity is bursty (a developer opens Linear, runs five searches, closes it for two hours). Hosted indexes price these workloads as if they were continuously hot. Turbopuffer prices them as what they actually are—mostly cold storage with occasional warm bursts.
The pattern across all three: many namespaces, sparse access per namespace, hybrid search needs, and a willingness to engineer around cold reads in exchange for an order-of-magnitude cost reduction. If your workload doesn't have those properties, the migration calculus is different.
Architecture: Object Storage Native vs Hosted Index
The architectural gap between Turbopuffer and Pinecone is bigger than the surface API suggests, and it determines almost every downstream tradeoff.
Pinecone: Hosted Index, Always-Hot
Pinecone runs as a managed service where vectors live on attached SSDs behind a serverless front-end. The index is always loaded, always ready to answer queries at sub-50ms p95. Storage, query routing, replication, and scaling are abstracted away. You upsert vectors, you query vectors, the operational surface is essentially a SaaS API. The economics of this model are straightforward: you pay for the SSD-backed capacity that keeps your index hot. Pinecone's serverless billing breaks this into read units ($8.25 per 1M), write units ($2 per 1M), and storage at roughly $0.33/GB/month, with a $50/month minimum on Standard. The price you pay reflects the fact that every vector you've upserted is sitting on indexed SSD ready to serve a query. That model is a feature when your access pattern matches it—high-throughput, single-index, latency-sensitive workloads. It's a tax when it doesn't.
Turbopuffer: Object Storage as Source of Truth
Turbopuffer flips the storage hierarchy. Object storage (S3, GCS, Azure Blob) is the source of truth, not a tier where cold data eventually lands. Each namespace is a prefix on object storage. The index lives there in serialized form. SSD and RAM are caches, populated lazily as namespaces get queried. The implications cascade. Storage cost drops from $0.33/GB/month to roughly $0.02/GB/month, because object storage prices are roughly 15x cheaper than provisioned indexed SSD. Namespace count is effectively unlimited—a million namespaces is just a million S3 prefixes, which is a problem cloud storage solves. And the query path bifurcates: cold queries pay the cost of reading from object storage (343ms p50 on 1M vectors per Turbopuffer's benchmarks), warm queries land at p50 ~8ms once the cache is hot. The tradeoff is operational. You inherit a latency model with two regimes instead of one, and you have to reason about which regime each query lands in. Cursor handled this by warming namespaces on session open. Notion handled it by accepting that the first search after a long idle pays a cold tax most users don't notice. The pattern that works depends on your traffic shape. For deeper context on how the underlying index choice affects retrieval quality regardless of which database you pick, see our analysis of embedding quality vs vector database performance.
Cost Comparison: Three Scales, Real Math
The cost difference between the two systems isn't a flat percentage—it's a function of how many namespaces you have and how often each one is queried. Here's what the math looks like at three realistic shapes we've seen in production.
A few patterns are worth calling out explicitly. The single-index, high-QPS workload at the bottom is the one shape where Pinecone holds its own or wins—because Turbopuffer's per-query read pricing scales linearly with traffic on a single hot namespace, while Pinecone's read units amortize across high-QPS bursts. Multi-tenant workloads with namespace sparsity are where the gap explodes; those are the workloads Cursor, Notion, and Linear were running.
The other line item that surprises teams: storage. At 1B vectors with 1536-dimensional embeddings, you're sitting on roughly 6 TB of vector data. At Pinecone's $0.33/GB/month that's ~$2,000/month in storage alone, before any queries. At Turbopuffer's object-storage rates the same volume is closer to $120/month. Storage dominates the bill at scale, which is why object storage as the primary store is such an aggressive cost lever.
For practical strategies on managing the embedding pipeline that feeds either database, our guide on when to re-embed documents walks through the upstream cost decisions that compound with whichever store you pick.
| Workload Shape | Vectors | Namespaces | Queries/mo | Pinecone Serverless | Turbopuffer | Winner |
|---|---|---|---|---|---|---|
| Single hot index, RAG bot | 5M | 1 | 5M | ~$70/mo | ~$45/mo | Turbopuffer (small margin) |
| Per-tenant SaaS, sparse access | 50M | 10,000 | 2M | ~$420/mo | ~$35/mo | Turbopuffer (12x) |
| Per-codebase, very sparse | 1B | 250,000 | 10M | ~$3,800/mo | ~$280/mo | Turbopuffer (13x) |
| Always-hot recommender, high QPS | 20M | 1 | 200M | ~$1,850/mo | ~$1,400/mo | Turbopuffer (small margin) |
| Always-hot recommender, very high QPS | 20M | 1 | 1B | ~$8,400/mo | ~$8,900/mo | Pinecone (slim) |
Latency: The Cold Read Problem and How to Engineer Around It
The latency story is where most migration plans either succeed or get reverted, so it's worth being precise about what you're signing up for.
The Numbers
Turbopuffer's published benchmarks on a 1M-vector namespace report: Pinecone Serverless on the same data shape lands at p95 ~45ms regardless of access pattern, because the index is always hot. The gap is real, and any migration plan that ignores it gets reverted in production. But the gap also doesn't matter for most workloads, because most workloads don't actually need every query to clear 50ms.
- Cold query (first hit, namespace not in cache): p50 ~343ms, worst case cited around 500-800ms
- Warm query (namespace cached in SSD/RAM): p50 ~8ms, p95 ~25ms
- Hybrid BM25 + vector cold: ~500ms on 1M documents
Patterns That Work
We've used four patterns to engineer around cold reads, in order of how often they apply: 1. Warm on session start. Cursor's pattern: when a user opens a project, prefetch the codebase namespace before the first search query. By the time the user types, the namespace is hot. This works whenever you have a session boundary that signals which namespaces will be queried. 2. Accept the tax for the first query. Notion's pattern: a 400ms first-search latency after a long idle is invisible to users typing into a search bar; subsequent searches at 8ms feel instant. This works for human-driven, bursty UIs. 3. Background-warm on a heartbeat. For workloads where the namespace set is predictable, run a low-priority background job that pings each namespace once an hour to keep the cache warm. Cheap, effective, doesn't show up in user latency. 4. Hybrid routing. Keep a small hot index on Pinecone (or in-memory) for latency-critical lookups (autocomplete, real-time filters) and route everything else to Turbopuffer. We've used this for one client where the front-end search bar had to clear 100ms but the long-form RAG retrieval could afford 500ms. The pattern that doesn't work: trying to use Turbopuffer for synchronous, latency-critical, single-shot queries against cold tenants. If your workload looks like that, don't migrate. For a deeper look at where retrieval latency actually dominates end-to-end—it's almost never the database—see our breakdown in vector search returns nothing: troubleshooting guide, which covers the diagnostic flow we use to localize latency in a full RAG stack.
Migration Playbook: Schema, Query Patterns, and the Gotchas
We've now done this migration enough times to have a runbook. Here's the version we ship to clients.
# Pinecone filter
filter = {
"category": {"$in": ["docs", "tutorials"]},
"published_at": {"$gte": "2026-01-01"}
}
# Turbopuffer equivalent
filters = [
"And",
[["category", "In", ["docs", "tutorials"]],
["published_at", "Gte", "2026-01-01"]]
]Phase 1: Schema Mapping
Pinecone's data model maps almost cleanly to Turbopuffer: The one structural decision worth making upfront: Pinecone's namespaces are sometimes used as a logical grouping inside a single index, while Turbopuffer treats namespaces as the primary unit of isolation and pricing. If you're using Pinecone namespaces as a tenant boundary, the mapping is direct. If you're using a single namespace with metadata filtering for tenant isolation, you should split into per-tenant namespaces during the migration—both for performance and for cost reasons, since Turbopuffer's pricing rewards the split.
- Pinecone index → Turbopuffer cluster
- Pinecone namespace → Turbopuffer namespace (literally a prefix on object storage)
- Pinecone metadata → Turbopuffer attributes (similar key-value structure)
- Pinecone filter expression → Turbopuffer filter (different syntax, similar semantics)
Phase 2: Query Layer Rewrite
This is where most teams underestimate the work. Pinecone's filter dialect uses MongoDB-style operators. Turbopuffer's filter syntax is similar but not identical, and several edge cases differ: Wrap the filter construction behind an interface so you can swap providers without touching every call site. We typically build a thin adapter layer during the dual-write window.
Phase 3: Dual-Write Window
For 24-48 hours, write every upsert to both Pinecone and Turbopuffer. Read from Pinecone. This window does two jobs: it builds the Turbopuffer corpus without a backfill spike, and it lets you compare query results between the two systems on production traffic. Watch for one specific failure mode: recall divergence. If your two systems return different top-K results for the same query, it's almost always because the index parameters differ, not because the data differs. Turbopuffer uses a centroid-optimized index variant; Pinecone uses HNSW. The difference is usually within noise, but for high-stakes recall (legal, medical) you should test it explicitly before cutover.
Phase 4: Read Cutover, Namespace by Namespace
Don't flip all reads at once. Cut over one namespace (or one tenant cohort) at a time, monitor for a full traffic cycle, then proceed. Keep Pinecone live as a fallback for at least a week after the last namespace migrates. The metric we watch most closely during cutover is the cold-read p99 latency by namespace, segmented by hour-of-day. A namespace that looked fine in dev because dev traffic kept it warm will surface cold reads at 3am in production.
Phase 5: Backfill and Decommission
Once reads are 100% on Turbopuffer and the system has been stable for a week, backfill any vectors that were written to Pinecone before the dual-write window started but aren't in Turbopuffer. Then turn off the Pinecone writer and deprovision the index. For teams running the broader RAG pipeline, our walkthrough on updating RAG knowledge without rebuilding covers the upsert-pattern decisions that compound with this migration.
When Pinecone Still Wins
Despite the cost gap, there are three workload shapes where we've kept clients on Pinecone after running the analysis.
Single hot index with hard sub-50ms p99 SLO. A high-QPS recommender, a real-time semantic search bar, or any synchronous user-facing path where a 400ms cold read would be visible. Turbopuffer's warm latency is fine; it's the cold path that doesn't fit. If you can't engineer the cold reads out, don't migrate.
Compliance burden already paid. If your team has already collected SOC 2 evidence on Pinecone, gone through a HIPAA BAA, or completed a customer-required vendor security review, the cost of redoing that on a new vendor is real. Pinecone's compliance posture is mature; Turbopuffer's is improving but newer. For a regulated client where the audit cycle is the long pole, the price gap doesn't always justify the disruption.
Lean ops capacity, simple workload. Pinecone's hot-by-default model is genuinely simpler to reason about. There's one latency regime, no cache-warming pattern, no cold-read SLO to track. For a small team with a single index and no scaling pressure, the operational simplicity is worth the price gap. Migrate when scale forces you to, not before.
For broader context on the build-vs-buy decision that often surfaces alongside these migrations, our writeup on when to build vs buy AI infrastructure covers the framework we use with clients.
How Particula Tech Approaches Vector Database Selection
We don't have a default vector database. We have a default question: what's the access pattern?
If a client shows up with a single high-QPS index and tight latency requirements, we usually keep them on Pinecone or move them to Qdrant for the cost-performance ratio. If they show up with multi-tenant sparseness, codebase-style indexing, or hybrid search needs across millions of workspaces, Turbopuffer is increasingly the default. The decision isn't ideological; it's a function of where the cost curve sits for the specific workload.
What we've stopped doing is recommending a vector database based on benchmark blog posts. The numbers in those posts almost never reflect the access pattern the client actually has. We run a two-week side-by-side on production-shape traffic before any migration decision goes to the board.
If you're staring at a vector database bill that's growing faster than your traffic, or you're about to commit to a multi-year contract on a hosted index, that's the conversation to have before signing. Cost curves on object storage have rewritten the playbook for sparse-access workloads, and the savings at scale are real—but so are the latency tradeoffs, and the wrong migration is more expensive than the right contract.
For more on the full retrieval stack that sits on top of either database, our pillar guide on RAG systems covers the upstream decisions—chunking, embedding choice, reranking—that compound with whichever store you pick.
Frequently Asked Questions
Quick answers to common questions about this topic
All three companies hit the same wall: per-tenant or per-codebase namespaces multiplied faster than any hosted-index pricing model could absorb. Cursor's published case study describes scaling code retrieval to 100B+ vectors with one namespace per repository, where the prior architecture required manually balancing indexes across servers. Turbopuffer's object-storage-native design removes the namespace cap entirely and prices storage at roughly $0.02/GB/month, versus Pinecone's $0.33/GB/month. For workloads where most namespaces are queried infrequently, the math swings 10-15x in Turbopuffer's favor.
