Turbopuffer
Last reviewed
May 25, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 3,040 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 25, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 3,040 words
Add missing citations, update stale details, or suggest a clearer explanation.
Turbopuffer is a serverless vector and full-text search database built from first principles on object storage such as Amazon S3 and Google Cloud Storage.[1] Founded in 2023 by Simon Hørup Eskildsen and Justin Li, two former Shopify infrastructure engineers, the company markets the system as roughly an order of magnitude cheaper than memory-resident vector databases by treating object storage as the source of truth and caching warm namespaces on local NVMe SSDs.[1][2] Turbopuffer became publicly available in July 2024 and is used in production by cursor, Notion, Linear, Suno, anthropic, Atlassian, and others.[3][4][5] By early 2026 the platform reported managing over a trillion vectors across more than 80 million namespaces for Cursor alone.[5]
Turbopuffer's design grew out of an engineering project at Readwise, the read-later and highlight-management startup. In late 2022, co-founder Simon Eskildsen, who had spent roughly a decade on shopify's infrastructure team scaling stateful services such as MySQL, Redis, and elasticsearch, was consulting for Readwise as it prepared to launch its Reader product.[2][6] The team wanted to add semantic search and article recommendations across more than 100 million documents, but pricing on the leading hosted vector databases came out to roughly $20,000 per month, several times Readwise's entire database budget at the time.[1][6] Eskildsen could not find a way to make the feature economically viable using off-the-shelf services.[1]
The cost gap prompted Eskildsen to sketch what a vector and search engine would look like if it used object storage rather than provisioned SSD as its primary storage medium.[1] Cloud object storage is roughly fifteen times cheaper per gigabyte than replicated block storage, but its read latency is approximately ten to twenty milliseconds rather than the sub-millisecond response times typical of in-memory databases.[7] Eskildsen concluded that, with appropriate caching and an LSM-tree storage layout, the latency tradeoff was acceptable for the workloads people actually wanted to run.[1]
Before founding turbopuffer, Eskildsen spent roughly a year on what he has described as "angel engineering," a model in which he worked in three-month engagements with early startups including Readwise, Replicate, and Causal.[2] That experience surfaced the Readwise search problem and led him to focus full time on the storage architecture he had been sketching.[2] Eskildsen and Justin Li, also a former senior staff software engineer at Shopify, started turbopuffer in 2023.[4] The product was used in private beta starting that year. cursor, the AI code editor, migrated all of its code embeddings to turbopuffer in November 2023 after experimenting with several alternatives, and reached out to Eskildsen by email; Eskildsen has said in interviews that he flew to meet the Cursor team shortly after their first contact.[2][5] Turbopuffer was launched publicly with a detailed engineering post on July 8, 2024, which described the architecture, pricing model, and the Readwise origin story.[1] The launch announcement was discussed on Hacker News the same week.[8]
On December 19, 2025, BetaKit reported that Turbopuffer had raised a seed round of undisclosed size led by Thrive Capital, with continued participation from existing backer Lachy Groom.[4] Earlier reports describe small angel funding in 2024 prior to the Thrive round.[4] No round led by Sequoia Capital has been publicly confirmed as of 2026.[4] The same coverage reported that Turbopuffer had reached annual recurring revenue in the tens of millions of dollars, grown headcount roughly fivefold during 2025, and was storing tens of petabytes of data and "trillions of vectors" for its customers.[4]
Turbopuffer's defining design choice is that object storage is the source of truth for every byte of customer data rather than a tier underneath a primary database.[1][7] Writes commit durably to object storage before they are acknowledged, and any frontend node in the fleet can serve queries for any namespace by lazily loading data from object storage on demand.[7] Local memory and NVMe SSDs serve only as caches, not as authoritative replicas.[1]
The storage engine uses a log-structured merge (LSM) layout on top of object storage.[1] Writes append to namespace-specific write-ahead-log objects, and a background compaction process merges them into larger sorted segments that hold both the vector embeddings and any associated attributes.[1] Vectors are indexed using approximate nearest neighbor structures including a Turbopuffer-specific variant the company has described in successive blog posts as ANN v1, v2, and v3.[9] Full-text indexes use an inverted-index design adapted for a key-value layout on object storage, with companion blog posts on the BM25 scoring path published in late 2025 and early 2026.[10][11]
The single-source-of-truth approach has two consequences. Because durability comes from the underlying object store, writes incur a commit latency that Turbopuffer documents as "up to roughly 200 ms" per request rather than the sub-millisecond commit latencies of memory-resident systems, and read consistency carries a floor of approximately ten milliseconds because every read must perform at least one object-store metadata check.[7] In exchange, the system scales horizontally without coordinated replication, requires no Zookeeper or Raft cluster for state management, and inherits the durability guarantees of the underlying cloud object store.[2][7]
Query traffic is served by stateless frontend nodes that pull namespaces from object storage into a tiered cache.[1] Hot namespaces sit in memory, warm namespaces on local NVMe SSDs, and cold namespaces remain only in object storage.[7] On a cache miss the frontend reads the relevant segments directly from S3 or GCS, which Turbopuffer measures at S3 metadata latencies of roughly 10 ms at the 50th percentile and 17 ms at the 90th percentile.[7]
Because nothing is permanently bound to a specific machine, a node failure does not cause a namespace to become unavailable. A new frontend can take over for any namespace by reading its log and segments from object storage.[1][7] Turbopuffer reports 99.99 percent uptime since launch as a consequence of this stateless design.[1]
Turbopuffer treats namespace loading as a runtime cache-fill problem. The first cold query against a namespace pulls the relevant segments from object storage, which Turbopuffer documents at roughly 444 ms at the 90th percentile for one million 768-dimensional vectors; subsequent warm queries against the same namespace run in approximately 10 ms at the 90th percentile.[1] In April 2026 the company introduced "namespace pinning," which lets customers pay for guaranteed residency in the warm cache, billed in GB-hours instead of the standard per-query model.[12]
For very large indexes, Turbopuffer's "ANN v3" design published in January 2026 targets approximately 200 ms p99 query latency over one hundred billion vectors per namespace at greater than one thousand queries per second, well above the scale most in-memory systems are designed to handle in a single index.[9] The ANN v3 design is based on a centroid-based index in the SPFresh family: vectors are grouped into clusters represented by centroid vectors, queries first compare against centroids to pick promising clusters, then search only within those clusters.[9] Vectors are stored at 1024 dimensions with 2 bytes per dimension (f16), and the index uses binary quantization with reranking to compress storage 16 to 32 times while preserving recall around 92 percent on the benchmarks Turbopuffer reports.[9] Each ANN v3 cluster spreads the index across storage-optimized shards by randomly assigning vectors at ingest time.[9]
Turbopuffer supports approximate nearest neighbor search over dense vectors with configurable recall.[1] The official documentation reports 90 to 100 percent recall depending on configuration.[3] Distance metrics include cosine similarity and Euclidean distance, and vectors can range from low-dimensional embeddings to thousands of dimensions per record.[3]
The system implements an inverted index with BM25 ranking ("FTS v2"), released in December 2025 and reportedly up to twenty times faster than the company's earlier full-text path.[13] A January 2026 follow-up post described scaling surprises in BM25 query evaluation, including cases where queries with more terms can run faster on object storage than queries with fewer.[11] A December 2025 post described a vectorized MAXSCORE-over-WAND implementation aimed at long, LLM-generated query strings.[14]
Vector and BM25 results can be combined in a single request. In April 2026 the company published a "rank by attribute" feature that mixes numeric attributes such as freshness or popularity into the first-stage relevance score.[15]
Records carry user-defined attributes that can be filtered at query time, including string equality, numeric ranges, and array membership.[16] In January 2025 Turbopuffer published a "native filtering" post describing how filters interact with the vector index to preserve recall on high-selectivity queries.[16]
A namespace in Turbopuffer is a logically isolated collection of records, billed and indexed independently. Customers commonly use one namespace per end-user, per repository, or per document tenant. Cursor uses more than 80 million namespaces in production, one per codebase, with active indexes in the warm cache and inactive ones in cold storage.[5] Turbopuffer reports having served more than 4 trillion documents and over 10 million writes per second across all customers as of its public marketing page.[3]
Writes are accepted at thousands of records per second per namespace and tens of gigabytes per second across the fleet.[5][7] Cursor's namespace-copy feature, which clones an existing namespace at a discounted write rate, has been used to handle write peaks of 10 GB/s and over a million writes per second during onboarding waves.[5]
Turbopuffer publishes a small set of headline latency and throughput numbers on its product page and in its launch post. For a workload of one million 768-dimensional vectors, the company reports cold query p90 of approximately 444 ms and warm query p90 of approximately 10 ms.[1] On larger ANN v3 indexes of one hundred billion vectors, the company targets p99 latency in the 200 ms range.[9]
Cursor's public case study reports that semantic search served by Turbopuffer "immediately" reduced retrieval costs by 95 percent compared to their prior infrastructure and delivered a 20-fold reduction in cost per semantic search operation.[5] Cursor also reports that Turbopuffer-backed semantic search improved code retention on large codebases by 2.6 percent and reduced dissatisfied user requests by 2.2 percent in internal evaluation, and that adding semantic search lifted their Composer agent's accuracy by up to 23.5 percent compared to a grep-only retrieval baseline.[5] After the namespace-copying feature shipped in January 2026, the median repository's time-to-first-query dropped from 7.87 seconds to 525 milliseconds, the 90th percentile from 2 minutes 49 seconds to 1.87 seconds, and the 99th percentile from 4 hours 3 minutes to 21 seconds.[5]
Turbopuffer publishes a "continuous recall" measurement post from September 2024 describing how the company tracks recall on live customer indexes rather than only on static benchmarks, with the stated goal of catching recall regressions as data distributions shift over time.[17]
Turbopuffer's pricing has shifted multiple times as the platform has matured, and the company publishes a public pricing changelog.[12] At the time of the July 2024 launch, the headline price was approximately $1 per month per million stored vectors and $4 per million queries.[1] The launch post described object-storage pricing of roughly $0.02 per GB per month for cold data, contrasted with about $0.33 per GB per month for the storage tier behind serverless Pinecone at the time.[1]
As of April 2026 the public plans listed on the company's pricing page are a Launch tier with a $64 per month minimum, a Scale tier with a $256 per month minimum, and an Enterprise tier starting at roughly $4,096 per month plus a 35 percent usage premium for elevated support and SLAs.[18] Query pricing is structured around a "TB Queried" base rate, with marginal discounts of up to 96 percent on very large namespaces and a base rate of $1 per PB of queried data after a April 2026 price cut.[12] Filterable attributes are billed once per vector column for both writes and storage.[12]
Pinned namespaces, introduced in April 2026, are billed in GB-hours of guaranteed cache residency, with minimums of 64 GB and 10 minutes per pin.[12]
cursor is the company's most publicly documented customer. Cursor migrated its code embeddings to Turbopuffer in November 2023 and now stores over a trillion vectors across more than 80 million namespaces, one per indexed repository.[5] Cursor has written publicly about its dependency on Turbopuffer in a joint case study published on the Turbopuffer website.[5]
Notion uses Turbopuffer to power document and workspace search at scale. According to Eskildsen's interview on the Latent Space podcast in 2024, Turbopuffer engineered dedicated dark-fiber links between cloud regions to meet Notion's latency targets.[2] Additional customers acknowledged on the company's product page or in dated press coverage include Linear, Superhuman, Suno, Ramp, New Computer, anthropic, and Atlassian.[3][4]
Turbopuffer raised a small amount of angel and seed financing in 2024 from individual investors including Lachy Groom, the former head of Stripe Issuing and a co-founder of Physical Intelligence.[4] On December 19, 2025, BetaKit reported a follow-on seed round of undisclosed size led by Thrive Capital, the New York venture firm founded by Joshua Kushner, with Lachy Groom participating again.[4] BetaKit's coverage states that Turbopuffer's annual recurring revenue had grown roughly tenfold during 2025 and that the company had increased headcount fivefold over the same period, while serving "trillions of vectors and tens of petabytes" of data.[4] No publicly disclosed funding round led by Sequoia Capital has been reported as of mid-2026.[4]
Turbopuffer's positioning relative to other vector databases turns on where the canonical copy of data lives.
Pinecone is a fully managed vector database whose serverless tier was, at the time of Turbopuffer's launch, marketed at approximately $0.33 per GB per month for stored data plus separate read-unit and write-unit charges.[1] Pinecone's architecture keeps recent and frequently queried data on provisioned SSD and memory; Turbopuffer's launch post argued that its object-storage-native design was up to roughly fifty times cheaper for cold storage at equivalent throughput.[1] Independent comparisons in 2025 and 2026 have replicated the cost advantage on storage-heavy workloads while noting that Pinecone retains lower commit latency for very small indexes.[19]
Weaviate is an open-source vector database that runs primarily as a memory-resident service with persistent local disk. Its hosted offering uses provisioned memory and SSD per cluster rather than object storage, so the cost structure is closer to Pinecone's than to Turbopuffer's.[20] Weaviate supports hybrid vector and BM25 search, modular embedding integrations, and on-premises deployment; Turbopuffer is closed-source and runs only as a managed service on the cloud regions the company supports.[3][20]
LanceDB is a serverless vector database built on the Lance columnar format, which also targets object storage as the storage medium and shares the broad goal of decoupling compute from storage.[21] LanceDB places more emphasis on local embedded deployments and the open-source Lance format itself, while Turbopuffer is a hosted, closed-source service. Both differ from Qdrant and chroma, which are open-source vector stores designed around always-on, locally-attached storage.[22][23]
Across this group, Turbopuffer's distinctive position is that any namespace can be served by any node by lazy-loading from object storage, with no permanently provisioned replica per namespace and no Zookeeper-style consensus layer to coordinate state.[1][7] The cost advantage is largest on workloads with many namespaces, modest per-namespace query traffic, and tolerance for occasional cold-start latency.[7]
Critics of the design have pushed back. A 2025 blog post from Zilliz, the company behind Milvus, argued that the headline cost advantage of object-storage-native systems narrows once a workload requires consistent low-latency reads across a large fraction of the index, because keeping the necessary working set warm pulls the effective cost closer to provisioned-SSD systems.[19] Particula and several independent benchmarks in 2026 reported that the gap remained large for storage-heavy workloads with sparse query patterns, the scenario most relevant to Cursor's per-repository code search and Notion's per-workspace document search.[20] Eskildsen has acknowledged in podcast interviews that workloads needing uniformly hot access to every namespace are not the best fit for Turbopuffer and that customers with such patterns are better served by traditional in-memory vector databases.[2]