Pinecone

AI Infrastructure

39 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

45 citations

Revision

v7 · 7,730 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Pinecone is a managed vector database for artificial intelligence applications that lets developers store, index, and query high-dimensional vector embeddings at scale without managing infrastructure. Founded in June 2019 by Edo Liberty, Pinecone is widely regarded as the company that created the commercial vector database category, and it remains one of the most prominent managed offerings in the space. As of mid-2026 the platform is used by more than 9,000 organizations, has raised approximately $138 million in disclosed venture funding, and was last priced at a $750 million valuation in its April 2023 Series B led by Andreessen Horowitz ^[3]^[8]^[9]. It competes with open-source alternatives such as Weaviate, Chroma, Milvus, and Qdrant, and with adjacent extensions like pgvector. The company is headquartered at 1375 Broadway, New York City, with engineering teams distributed across multiple continents.

Pinecone's product roadmap has been defined by three architectural generations: the original pod-based service launched commercially in October 2021, the serverless re-architecture introduced in January 2024, and the agent-oriented Nexus knowledge engine announced in May 2026. Together these phases reflect the company's shift from a basic vector store toward what it calls knowledge infrastructure for AI agents and retrieval-augmented systems.

What is Pinecone used for?

Pinecone is used to power retrieval-augmented generation (RAG), semantic search, recommendation engines, anomaly and fraud detection, and agent-grounded knowledge retrieval. Developers convert text, images, audio, or other data into vector embeddings using a model, store those vectors in Pinecone, and then query for the records most similar to a given query vector. Because Pinecone returns nearest-neighbor matches in milliseconds across billions of vectors, it is commonly used to give large language models long-term memory and to ground their answers in proprietary data, reducing hallucination. Andreessen Horowitz general partner Peter Levine, who led the Series B, described the role of the database succinctly: "think of the LLM as an application almost that sits on top of this database, and what the database will do is it will hold information and feed it into LLM for more precise answers for long-term storage of results" ^[3].

history and founding

Edo Liberty founded Pinecone in June 2019 after spending years working on large-scale machine learning systems at major technology companies. Before starting the company, Liberty served as Director of Research at Amazon Web Services and Head of Amazon AI Labs from October 2017 to March 2019, where his team developed core technologies behind SageMaker, OpenSearch, and Kinesis. He had previously joined AWS in August 2016 as Senior Manager of Research for SageMaker. Before Amazon, Liberty held research positions at Yahoo Research from 2009 to 2016, including leading Yahoo's Scalable Machine Learning Research Group in New York from 2012 onward at the request of Yahoo Chief Scientist Ron Brachman. Liberty holds a B.Sc. in Physics and Computer Science from Tel Aviv University and a Ph.D. in Computer Science from Yale University, followed by a postdoctoral fellowship at Yale's Program in Applied Mathematics. He has authored more than 75 academic papers and patents on streaming algorithms, sketching, and approximate nearest-neighbor search ^[1].

Liberty's experience at AWS and Yahoo showed him the practical power of combining AI models with vector search to improve applications such as spam detection, content ranking, and recommendation systems. He saw that custom vector search systems were being built at enormous scales inside companies like Amazon, Google, and Facebook, and assumed there must already be packaged software for organizations without similar resources. To his surprise, no such product existed at production quality. He concluded that conventional databases were poorly suited to similarity search over high-dimensional data, which led him to start a company purpose-built for the workload. The phrase "vector database" itself was popularized by Pinecone's marketing during this period ^[2].

The company spent its first 18 months in stealth mode developing the initial managed service. Pinecone left stealth in January 2021 when it announced both its public service and a $10 million seed round led by Wing Venture Capital. The commercial product became generally available in October 2021, after which adoption accelerated rapidly. Series A funding of $28 million followed in March 2022, and a $100 million Series B at a $750 million valuation closed in April 2023, led by Andreessen Horowitz. Reports during 2025 indicated an additional later-stage round at substantially higher valuation, although Pinecone has continued to publicly cite total funding of approximately $138 million across its primary venture rounds ^[3]^[4]^[5].

Who is the CEO of Pinecone?

The CEO of Pinecone is Ash Ashutosh, who took the role on September 8, 2025, when founder Edo Liberty transitioned from CEO to Chief Scientist ^[6]^[7]. Ashutosh is a three-time founder of storage and data infrastructure companies (Serano Systems, AppIQ, and Actifio, the last of which was acquired by Google in 2020), and previously served as CTO of HP's storage division, a partner at Greylock Partners, and most recently Global Director of Solution Sales at Google. He holds an electrical engineering degree and a computer science master's from Penn State. The leadership transition was framed by the company as a way to scale enterprise go-to-market while letting Liberty focus on the technical roadmap, particularly the company's pivot toward knowledge infrastructure for agents. Announcing the change, Liberty wrote that "the convergence of AI and data represents the defining technology transition of our lifetime" and that "this powerful intersection is precisely where Pinecone thrives," adding that he "could not have hoped for a better person than Ash to partner with" on growing the company ^[7].

funding

Pinecone has publicly disclosed approximately $138 million in venture funding across three primary rounds, with additional later-stage capital reported but not always confirmed by the company ^[3]^[4]^[5]^[8].

Round	Date	Amount	Lead Investor	Valuation
Seed (announced)	January 27, 2021	$10M	Wing Venture Capital	Not disclosed
Series A	March 29, 2022	$28M	Menlo Ventures	Not disclosed
Series B	April 27, 2023	$100M	Andreessen Horowitz	$750M post-money

Wing Venture Capital's founding partner Peter Wagner joined the board at the seed round. Tiger Global Management also entered the cap table at the Series A. The Series B included participation from ICONIQ Growth, Menlo Ventures, and Wing Venture Capital. Andreessen Horowitz general partner Peter Levine wrote at the time that vector databases would become "a fundamental component in the new AI data stack," particularly for reducing hallucination and providing long-term memory in large language models ^[3].

At the time of the Series B announcement, Pinecone reported expanding from a handful of customers to roughly 1,500, with about 100 employees and plans to scale to 150 to 200 by year-end 2023 ^[3]. By late 2025, headcount stood at roughly 130 across four continents, and the company stated its customer base had grown beyond 5,000, including more than 9,000 organizations using the platform across its various tiers by mid-2026. In 2024 the company reportedly generated about $26.6 million in revenue, although third-party trackers have published different figures and the company itself does not disclose ARR publicly ^[8]^[9].

Reports from late 2025 indicated that Pinecone was exploring strategic options, including a potential sale, with speculation suggesting a valuation "north of $2 billion" well above the last priced round. Oracle, IBM, MongoDB, and Snowflake were named as potential acquirers in coverage by VentureBeat, Calcalist, and other outlets, which framed the move against rising competition and the loss of high-profile customers such as Notion ^[6]^[45]. CEO Ashutosh publicly stated that an acquisition was not the goal, while founder Liberty said the company was "doubling or tripling down" on its mission to build knowledgeable AI ^[6].

product evolution

Pinecone's product has gone through three distinct architectural eras, each reflecting the changing requirements of vector workloads at scale.

Era	Period	Key product	Defining trait
Pod-based	2021 to 2024	Pinecone v1	Pre-provisioned compute units sized by workload
Serverless	January 2024 to present	Pinecone Serverless	Object-storage-backed indexing with read/write separation
Knowledge engine	May 2026 to present	Pinecone Nexus	Compiled context artifacts and KnowQL retrieval for agents

Major product milestones include the launch of pod-based clusters in 2021, a re-platform to serverless announced January 16, 2024, multi-cloud general availability on Microsoft Azure and Google Cloud on August 27, 2024, the December 2024 general availability of Pinecone Inference for embeddings and reranking, the September 2024 public preview of Pinecone Assistant, the December 2025 launch of Dedicated Read Nodes, and the May 2026 announcement of Pinecone Nexus along with KnowQL, a marketplace of agent applications, and a new Builder pricing tier ^[10]^[11]^[12]^[13]^[14].

architecture

Pinecone's architecture has evolved through three deployment models: pod-based (legacy), serverless (the current default), and Bring Your Own Cloud (BYOC) for regulated deployments.

pod-based architecture

The original Pinecone deployment model used pods, which are pre-configured units of hardware (including CPU, memory, and storage). Users selected a pod type and size based on their workload requirements, then scaled by adding replicas for throughput or increasing pod size for capacity. This model was straightforward but required users to make provisioning decisions upfront and could lead to over-provisioning or under-provisioning.

Pinecone offered three pod types, each optimized for different workload characteristics:

Pod type	Optimized for	Key characteristic
p1	Low-latency search	Fastest query performance, higher cost per vector
p2	Cost-effective search	Balanced latency and storage density
s1	Storage-optimized	Highest vector capacity per pod, slightly higher latency

Each pod type was available in multiple sizes (x1, x2, x4, x8), with each size doubling the capacity of the previous one. Users could also add read replicas to increase query throughput without changing the pod type or size. Pod-based indexes remain accessible for legacy customers but Pinecone has steered new workloads toward the serverless tier, including providing free migration tooling for indexes with up to 25 million records and 20,000 namespaces ^[10]^[15].

How does Pinecone Serverless work?

In January 2024, Pinecone launched Pinecone Serverless, which separates reads, writes, and storage into independent components. The design eliminates the need to provision compute, configure replicas, or manage node health. Users create an index, specify a cloud computing provider and region, and Pinecone handles everything else behind the scenes ^[10].

At launch, Pinecone said the new architecture could deliver a 10x to 100x cost reduction versus pod-based deployments, and cited savings of up to 50 times for many workloads, since users pay only for the resources they actually consume rather than for idle capacity ^[10]. Independent customer reports have cited both savings and, in some cases, increased costs depending on access pattern: workloads with bursty traffic and large idle periods see the largest savings, while workloads running near 100 percent utilization can sometimes be cheaper on dedicated infrastructure ^[10]^[16].

Internally, the serverless architecture consists of several key components:

Blob storage: All index data is persisted in cloud object storage (such as Amazon S3), which serves as the source of truth. This separation of storage from compute allows each layer to scale independently.
Mutation log: Every write operation (upsert, update, delete) is recorded in an ordered log. This log provides sequencing for all mutations applied to the index and enables the system to replay changes if needed.
Index builder: A background process reads from the mutation log and constructs geometrically partitioned indexes. Vectors are divided into partitions based on proximity in the vector space, with each partition represented by a centroid. The builder commits immutable index segments (called "slabs") to blob storage and updates a manifest that tracks which segments belong to the current version of the index.
Memtable and freshness layer: An in-memory buffer called the memtable holds newly written records along with their metadata and log sequence number. Read operations check the memtable first, ensuring that newly upserted vectors become searchable within seconds without waiting for the slab indexer to complete a cycle. The freshness layer wraps the memtable and presents a brute-force searchable view over the most recent data.
Query routers and executors: Stateless routers validate queries and identify which slabs are likely to contain relevant vectors. Stateless executors load relevant index slabs on demand, cache them in memory or local SSD, and search them. Results from cached slabs and the freshness layer are merged and returned to the client ^[17]^[18].

This log-structured indexing approach, similar in concept to log-structured merge trees used in key-value stores like LevelDB and RocksDB, allows Pinecone to handle continuous writes without locking or blocking read operations.

slab architecture and adaptive indexing

Within a serverless namespace, Pinecone organizes records into immutable files called slabs that progress through hierarchical levels. New writes from the memtable produce L0 slabs. When enough L0 slabs accumulate, a background compaction process merges them into a smaller number of L1 slabs. L1 slabs in turn compact into L2, and in very large deployments, L2 into L3. The architecture scales horizontally without resharding because slabs are immutable and individually addressable ^[18].

Different slab levels use different indexing techniques chosen automatically based on size:

Layer	Vector count (approx.)	Index technique
Memtable	Up to ~10,000	Brute-force scan
L0 slabs	Tens of thousands	Ananas, Pinecone's proprietary FJLT-based index
L1 slabs (small)	Hundreds of thousands	Product-quantized full scan (PQFS) or scalar quantization
L1 slabs (large) and L2+	Millions to billions	IVF (inverted file) clustering with Ananas indexes per cluster

Ananas is Pinecone's in-house indexing algorithm based on Fast Johnson Lindenstrauss Transforms, designed to deliver consistent recall while keeping memory and compute footprints low. Hot slabs are cached in memory and on local SSDs on query executors, while cold slabs are pulled from object storage on demand. This tiered approach lets Pinecone amortize indexing cost while keeping latency predictable for the slabs being actively queried ^[18].

geometric partitioning

Rather than traditional hash-based or range-based sharding, Pinecone serverless uses geometric partitioning. The vector space is divided into regions, each represented by a centroid. New vectors are assigned to the partition whose centroid is closest. This approach has two advantages: it enables fine-grained data isolation (queries only need to search relevant partitions rather than scanning all shards), and it adapts dynamically as the data distribution evolves over time ^[17].

Dedicated Read Nodes

Dedicated Read Nodes (DRNs) entered public preview on December 1, 2025, and reached general availability shortly afterward. DRNs provide exclusive infrastructure for queries: each provisioned node is reserved for a single customer's index, eliminating noisy-neighbor effects and per-account read rate limits. The architecture scales along two dimensions, replicas (for query throughput and availability) and shards (for storage capacity), with data kept warm in memory and on local SSD to avoid cold-storage latency spikes ^[13]^[19].

Reported customer benchmarks include:

Customer scenario	Vector count	QPS	P50 latency	P99 latency
Design platform (sustained)	135 million	600	45 ms	96 ms
Design platform (load test)	135 million	2,200	60 ms	99 ms
E-commerce marketplace	1.4 billion	5,700	26 ms	60 ms

DRNs use hourly per-node pricing rather than per-request pricing, which can make them more economical than serverless pricing for sustained, high-throughput production workloads where utilization is consistently high ^[13]^[19].

Bring Your Own Cloud

For organizations with strict data sovereignty, network isolation, or data residency requirements, Pinecone offers a Bring Your Own Cloud (BYOC) deployment model. BYOC keeps the data plane inside the customer's own cloud account (AWS, GCP, or Azure) while preserving the managed Pinecone control plane experience. The customer's cluster pulls queued operations from the control plane over TLS and executes them locally, meaning Pinecone does not need interactive (SSH or VPN) access to operate the deployment. Private connectivity options including AWS PrivateLink, GCP Private Service Connect, and Azure Private Link can be layered on top for additional network isolation. BYOC entered public preview on AWS, GCP, and Azure for Enterprise tier customers in 2025 ^[20].

indexing algorithms

Pinecone's indexing layer is proprietary and not exposed to users for direct tuning, in contrast to open-source alternatives where developers must select index types (HNSW, IVF, DiskANN, ScaNN) and tune parameters such as M, ef_construction, and ef_search themselves. Pinecone's documentation and engineering blogs describe a layered design that combines several well-known building blocks rather than a single algorithm.

For smaller slabs (up to roughly one million vectors), the system uses Ananas, a graph-light approximate nearest-neighbor approach based on Fast Johnson Lindenstrauss Transforms (FJLT) for dimensionality projection followed by candidate scoring with quantization. For larger slabs (above one million vectors), Pinecone uses IVF clustering on top, with Ananas indexes built per IVF cluster. This hybrid is conceptually similar to IVF with a residual quantizer, but Pinecone's specific implementation is closed source ^[18].

While HNSW (Hierarchical Navigable Small World) is the most common index in open-source vector databases, Pinecone does not publicly use HNSW as its primary structure in the serverless architecture. Pinecone has published research that compares HNSW against alternative structures, and the company has emphasized that for its workload mix (large datasets with many namespaces, frequent compaction, and high write throughput), graph-based indexes can be expensive in memory and difficult to update incrementally. The pod-based service did rely on HNSW-like graph indexes; the serverless system replaced these in favor of the slab-and-cluster design ^[10]^[18].

The key parameters that influence HNSW performance, which Pinecone tunes automatically rather than exposing, include:

Parameter	Description	Effect of increasing
M	Maximum connections per node per layer	Higher recall, more memory usage, slower builds
ef_construction	Candidate list size during index building	Better graph quality, slower index construction
ef_search	Candidate list size during search	Higher recall, slower query latency

In Pinecone's managed environment, these and analogous parameters for IVF and product quantization are tuned automatically based on the index configuration and workload characteristics. Users do not set them directly, which is a deliberate simplification compared to open-source alternatives like Qdrant or Milvus where developers must tune these values themselves.

key features

real-time indexing

Pinecone supports real-time upserts: vectors can be inserted or updated and become immediately queryable, typically within seconds, because the memtable and freshness layer wrap newly written records before the background slab indexer commits them. This is useful for applications where the underlying data changes frequently, such as real-time recommendation engines, continuously updated knowledge bases, or fraud-detection systems that ingest new event embeddings throughout the day ^[17].

metadata filtering

Every vector in Pinecone can carry arbitrary key-value metadata. At query time, users can apply filters on this metadata alongside the vector similarity search. For example, a query might search for the most semantically similar product descriptions but restrict results to items in a particular category or price range ^[21].

Pinecone's metadata filtering query language is based on MongoDB's query and projection operators. The following operators are supported:

Operator	Description	Supported types
`$eq`	Equal to	Number, string, boolean
`$ne`	Not equal to	Number, string, boolean
`$gt`	Greater than	Number
`$gte`	Greater than or equal to	Number
`$lt`	Less than	Number
`$lte`	Less than or equal to	Number
`$in`	Value is in a specified array	String, number
`$nin`	Value is not in a specified array	String, number
`$exists`	Field exists on the vector	Number, string, boolean
`$and`	Logical AND of conditions	N/A (combinator)
`$or`	Logical OR of conditions	N/A (combinator)

A query with metadata filtering looks like this:

results = index.query(
    vector=query_embedding,
    top_k=10,
    include_metadata=True,
    filter={
        "genre": {"$eq": "fiction"},
        "year": {"$gte": 2020}
    }
)

Key limitations of metadata filtering include: each vector supports up to 40 KB of metadata; each $in or $nin operator accepts a maximum of 10,000 values; and only $and and $or are allowed at the top level of filter expressions ^[21].

In the serverless architecture, Pinecone uses disk-based bitmap indexes for metadata filtering. These indexes are adapted from techniques used in data warehouses and are designed to handle high-cardinality filtering scenarios such as access control lists efficiently. The indexes co-locate with slabs and are loaded on demand alongside vector data ^[17].

namespaces

Namespaces partition data within a single index. Vectors in different namespaces are isolated from each other, meaning a query in one namespace will never return results from another. This is useful for multi-tenant applications where each customer's data needs to remain separate without creating entirely separate indexes ^[21].

In the serverless architecture, namespaces function as hard partitions. The index builder creates geometric partitions only within namespace boundaries, meaning queries are automatically scoped to the specified namespace. This design supports cost-effective multi-tenant scenarios; for example, Notion used Pinecone's namespaces to maintain millions of isolated user-level indexes within a single Pinecone index. In its March 2025 launch week, Pinecone increased the per-index namespace limit and added optimizations specifically for high-namespace-count workloads, supporting millions of namespaces per index ^[17]^[22].

sparse-dense hybrid search

Pinecone supports hybrid search by combining dense and sparse vector representations in a single query. Dense vectors (from models such as OpenAI's text-embedding-3-small or Cohere's embed-multilingual-v3) capture semantic meaning, while sparse vectors (from algorithms like BM25 or SPLADE, or from Pinecone's hosted pinecone-sparse-english-v0 model) capture keyword-level relevance. Each record can contain both a dense vector and a sparse vector, along with metadata. At query time, Pinecone blends the results from both representations to return more relevant matches ^[23].

Sparse vectors have a very large number of dimensions where only a small proportion of values are non-zero. Each dimension corresponds to a word from a dictionary, and the value represents the importance of that word in the document. Pinecone's BM25 encoder exposes the standard parameters b (document length normalization, default 0.75) and k1 (term frequency saturation, default 1.2). SPLADE encoders use a pretrained BERT-style model to learn term weights and term expansions, often producing higher recall than classical BM25 on the same corpus.

In 2025, Pinecone introduced sparse-only indexes alongside the existing combined sparse-dense indexes, and in May 2026 the company announced native full-text search inside the core Pinecone database as part of its Nexus launch week, partly addressing a long-standing gap relative to Weaviate and Milvus ^[22]^[24].

API usage

Pinecone provides client libraries for Python, Node.js (TypeScript), Java, Go, .NET, and Rust. The Python SDK is the most widely used and is distributed on PyPI as pinecone (with optional extras for asyncio and gRPC transports), while the Node.js client is published as @pinecone-database/pinecone. A typical workflow involves creating an index, upserting vectors, and querying:

from pinecone import Pinecone, ServerlessSpec

# Initialize
pc = Pinecone(api_key="your-api-key")

# Create a serverless index
pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1"
    )
)

# Connect to the index
index = pc.Index("my-index")

# Upsert vectors with metadata
index.upsert(
    vectors=[
        {
            "id": "doc-1",
            "values": [0.1, 0.2, ...],  # 1536-dimensional vector
            "metadata": {"source": "wiki", "category": "science"}
        }
    ],
    namespace="my-namespace"
)

# Query with metadata filter
results = index.query(
    vector=[0.15, 0.22, ...],
    top_k=5,
    include_metadata=True,
    filter={"category": {"$eq": "science"}},
    namespace="my-namespace"
)

Pinecone also supports bulk operations for upserting, updating, and deleting vectors by metadata filter, which were introduced in 2025 to simplify data management at scale, and integrated inference operations that allow embedding and reranking to be performed inside the same API call as a query ^[25]^[26].

Pinecone Inference

Pinecone Inference provides hosted embedding and reranking models that run on Pinecone's own infrastructure, removing the need for a separate model-serving stack. Embeddings and reranking entered public preview during 2024 and reached general availability on December 2, 2024, alongside an integrated inference API that lets developers embed, store, and query in a single round trip ^[11]^[27].

The initial roster of hosted models included:

Model	Type	Provider	Notes
multilingual-e5-large	Dense embedding	Pinecone-hosted (Microsoft research)	Multilingual, strong on short queries against medium passages
llama-text-embed-v2	Dense embedding	Pinecone-hosted	Long-context embedding optimized for retrieval
pinecone-sparse-english-v0	Sparse embedding	Pinecone	Built on the DeepImpact architecture
cohere-embed-multilingual-v3.0	Dense embedding	Cohere via Pinecone	Multilingual semantic embedding
pinecone-rerank-v0	Reranker	Pinecone	First-party cross-encoder reranker
cohere-rerank-3.5	Reranker	Cohere via Pinecone	Enterprise reranking model
bge-reranker-v2-m3	Reranker	Open-source via Pinecone	Multilingual reranker

Pinecone reported that its sparse-english-v0 model produced up to 44 percent and on average 23 percent better NDCG@10 than BM25 on standard retrieval benchmarks, while pinecone-rerank-v0 delivered up to 60 percent and on average 9 percent improvement over comparable rerankers on the BEIR benchmark suite ^[27].

Pinecone Assistant

Pinecone Assistant is a higher-level service that lets developers build production-grade chat and agent-based applications without manually constructing retrieval-augmented generation (RAG) pipelines. Users upload documents to the Assistant, which handles chunking, embedding, indexing, retrieval, query planning, model orchestration, and reranking automatically. The Assistant then uses a language model to generate answers grounded in the uploaded documents.

Assistant entered public preview on September 18, 2024, and reached general availability in early 2025. The public-preview release added expanded LLM support, an Evaluation API for measuring correctness and completeness, the ability to associate and filter files by metadata, and a redesigned console UI. Pinecone has positioned Assistant as the simplest path from prototype to production for knowledge-retrieval applications, particularly in financial analysis, legal discovery, customer support, and e-commerce ^[12]^[28].

What is Pinecone Nexus?

In May 2026, Pinecone announced Pinecone Nexus, which the company describes as "a knowledge engine for the agentic AI era, moving reasoning from retrieval to compilation, with KnowQL as the standard query language for agents" rather than a traditional retrieval service ^[14]. Nexus moves reasoning upstream from query-time retrieval to a compilation step performed in advance. The system has two main components: a context compiler that converts raw enterprise data into persistent task-specific knowledge artifacts, and a composable retriever that serves those artifacts to agents with field-level citations and deterministic conflict resolution.

The design rationale is captured in the company's framing that "most agent failures are data failures, not model failures," because agents spend the bulk of their token and latency budget on retrieval and assembly before they ever begin reasoning ^[14]. To quantify the approach, Pinecone built KRAFTBench, a benchmark of 150 questions across 9 sectors and 10 financial topics, evaluated over a corpus of 493 free-form 10-K filings (roughly 245 MB total) drawn from S&P 500 companies' 2022 SEC filings, with Claude as the judge model. Against agentic-RAG and coding-agent baselines, Pinecone reported the following results ^[14]:

System	Completion rate	Average latency	Accuracy	Average tokens	Average steps
Pinecone Nexus	100%	22.7 s	0.680	6,733	1.69
Agentic RAG	98.7%	37.9 s	0.413	49,103	7.77
Coding Agent	62.7%	84.1 s	0.585	528,301	14.77

The Nexus release introduced four primitives: Artifacts (typed pieces of information), Context (curated artifact sets for specific roles), Knowledge (the collective context store across an organization), and the Knowledge Engine itself, which includes an autonomous Context Compiler. KnowQL, a declarative query language released alongside Nexus, lets agents specify the answer shape, access controls, provenance requirements, and budget envelopes for a query rather than expressing it as a free-form vector search.

The May 2026 launch week also included a Pinecone Marketplace with more than 90 production-ready knowledge applications across sales, finance, insurance, real estate, legal, HR, and customer support; a new $20-per-month Builder pricing tier aimed at solo developers and small teams; native full-text search inside the core Pinecone database; and the Frankfurt (eu-central-1) AWS region for European data-residency workloads ^[14]^[29].

real-world use cases and customers

Pinecone's customer base spans more than 9,000 organizations as of 2026, including a substantial number of Fortune 500 enterprises. Notable named customers include:

Company	Use case	Details
Notion	Knowledge Q&A	Notion AI's Q&A feature ran on Pinecone serverless across billions of documents with namespace-per-user isolation, and Notion reported a 60 percent cost reduction after migrating from pod-based to serverless. By late 2025, reporting indicated Notion had since moved its vector search to Turbopuffer, primarily for cost reasons ^[17]^[30]^[45].
Gong	Revenue intelligence	Stores billions of vectors from customer conversations to power Gong's Smart Trackers feature and real-time concept classification ^[31].
Shopify	E-commerce search	Uses Pinecone for semantic product search and recommendations across the merchant network ^[3].
Microsoft	Internal AI tooling	Microsoft is listed by Pinecone among its enterprise customers, and is also a partner via the Azure Native Integration program ^[9].
Vanguard	Customer support	Achieved 12 percent accuracy improvement with hybrid retrieval, reduced call times, and enhanced compliance ^[9].
Adobe	Creative tools	Semantic search across creative assets ^[9].
Cisco	Security analytics	Threat detection using vector similarity ^[9].
HubSpot	CRM intelligence	Powers semantic search in CRM platform ^[9].
Workday	HR and finance	Semantic search across HR and finance records ^[9].
L'Oreal	Marketing intelligence	Knowledge-base assistant for marketing content ^[9].
Zapier	Workflow automation	Integrates vector search into automated workflows ^[3].
OpenAI	Developer tooling	Listed as a customer; OpenAI documentation also references Pinecone in its cookbook examples ^[9].
Cohere	Embedding partner	Joint distribution for hosted Cohere embeddings and rerankers ^[9]^[27].
Duolingo	Education	Semantic search and personalization ^[9].
Asana	Work management	Knowledge retrieval inside work-management workflows ^[9].
ZoomInfo	B2B data	Recommendation engine on Dedicated Read Nodes at scale ^[9]^[13].
DISCO	Legal technology	E-discovery and document review ^[9].
Help Scout	Customer support	Serverless RAG for support agents ^[9].
New Relic	Observability	Conversational interface for engineering data ^[9].
Chipper Cash	Fintech	Real-time fraud detection ^[9].

Common application patterns across these customers include retrieval-augmented generation for customer-support chatbots, semantic search over product catalogs, recommendation engines, anomaly and fraud detection, agent-grounded knowledge retrieval, and document deduplication ^[9]^[32].

How much does Pinecone cost?

Pinecone uses a consumption-based pricing model for its serverless indexes, charging on three primary metrics: read units (RUs), write units (WUs), and storage. As of mid-2026, the published tier structure is:

Plan	Monthly minimum	Included resources	Read units (per million)	Write units (per million)	Storage (per GB/month)	Support
Starter (free)	$0	2 GB storage, 2M WUs/month, 1M RUs/month, up to 5 indexes (AWS us-east-1 only)	0 (within limits)	0 (within limits)	0 (within limits)	Community
Builder	$20	$20/month flat, up to 10 GB storage, up to 5M WUs/month	Included up to limit	Included up to limit	Included up to limit	Community
Standard	$50	$15 in monthly usage credits	$16 to $18 (varies by cloud and region)	$4 to $4.50	$0.33	Email
Enterprise	$500	$150 in monthly usage credits, 99.95 percent uptime SLA, 100 projects, 200 indexes per project, 100,000 namespaces per index	$24 to $27	$6 to $6.75	$0.33	Priority
Dedicated (BYOC)	Custom	Customer-managed cloud account, all enterprise features, private connectivity	Custom	Custom	Custom	Dedicated

The Starter plan allows experimentation in AWS us-east-1 with one project and up to two users. The Builder tier was introduced during Pinecone's May 2026 launch week as a low-cost option for solo developers. Standard and Enterprise plans combine minimum monthly commitments with pay-as-you-go rates beyond the included credits. Enterprise customers can also purchase Dedicated Read Nodes on hourly per-node pricing for sustained, high-throughput workloads ^[29]^[33].

Pinecone is available through the AWS, Azure, and Google Cloud marketplaces with pay-as-you-go billing. Buying through a marketplace can simplify procurement and apply existing cloud-commitment credits against Pinecone usage ^[34].

cost considerations at scale

For datasets under 50 million vectors, managed services like Pinecone tend to be cheaper than self-hosting due to the hidden cost of DevOps. At larger scales, self-hosted open-source options can become more cost-effective. Some organizations have reported significant cost savings by migrating away from Pinecone to self-hosted solutions like pgvector; for example, Confident AI publicly documented their migration from Pinecone to pgvector, citing cost reduction as a primary factor ^[35]^[36]. Cost was also the reported driver behind Notion's late-2025 move from Pinecone to the object-storage-backed vector database Turbopuffer ^[45].

The introduction of Dedicated Read Nodes in late 2025 added a new pricing dimension. For applications with sustained high query throughput, hourly per-node pricing can come out below per-request serverless pricing. Pinecone's own benchmark numbers cited cost reductions of 5 times to 10 times for high-traffic recommendation workloads relative to per-request pricing, although those figures have not been independently verified ^[13].

cloud and region availability

Pinecone serverless is available across AWS, Azure, and Google Cloud. The published list of supported regions as of mid-2026 includes:

Cloud	Region code	Location	Available on
AWS	us-east-1	Northern Virginia, USA	All plans (only region for Starter)
AWS	us-west-2	Oregon, USA	Standard, Enterprise
AWS	eu-west-1	Ireland	Standard, Enterprise
AWS	eu-central-1	Frankfurt, Germany	Standard, Enterprise (added May 2026)
AWS	ap-southeast-1	Singapore	Standard, Enterprise
GCP	us-central1	Iowa, USA	Standard, Enterprise
GCP	europe-west4	Netherlands	Standard, Enterprise
Azure	eastus2	Virginia, USA	Standard, Enterprise

Additional regions are added on a rolling basis. For data residency requirements outside this list, customers can use the BYOC Dedicated tier, which deploys Pinecone's data plane into the customer's own AWS, GCP, or Azure account ^[29]^[37].

integrations

Pinecone has built integrations with most of the popular AI development frameworks and tools:

Integration	Description
LangChain	PineconeVectorStore class for retrieval-augmented generation pipelines
LlamaIndex	Native vector store connector for indexing and querying
OpenAI	Compatible with OpenAI embedding models; used in OpenAI's cookbook examples
Pinecone Inference	Hosted embedding and reranking models on Pinecone's own infrastructure
Anthropic	Reference integration in the GenAI stack since the Pinecone Serverless launch
Mastra	Reference vector store provider
Airbyte	Data ingestion connector for loading documents into Pinecone
Fivetran	Managed ELT pipeline integration with Pinecone Assistant
Semantic Kernel	Microsoft's orchestration SDK for AI agents
Haystack	Document store integration for search pipelines
Vercel	Templates and server functions in the Vercel AI SDK
Pulumi	Infrastructure-as-code provider for Pinecone resources, including BYOC
Confluent	Streaming ingestion via Confluent Cloud connectors
Box	Source-of-truth integration for Nexus knowledge artifacts
Unstructured	Document parsing and extraction for Nexus
Microsoft Entra ID	Single sign-on via Azure Native Integration
Commvault	Backup, recovery, and threat-detection integration for Pinecone data

The LangChain integration, through the langchain-pinecone Python package, allows developers to build RAG pipelines that embed user queries, search Pinecone for relevant context, and pass that context to a large language model for answer generation. Microsoft and Pinecone formalized a deeper Azure Native Integration in late 2024, providing unified billing, single sign-on through Microsoft Entra ID, deployment templates, and integration with Azure OpenAI Service ^[38].

limitations

While Pinecone excels at operational simplicity, it has several notable limitations:

Closed source: Pinecone's codebase is proprietary. Users cannot inspect the implementation, contribute fixes, or self-host the software (except indirectly through the BYOC Dedicated plan, which still relies on Pinecone's container images and control plane). This limits transparency and creates vendor lock-in ^[39].
Metadata size cap: Each vector can carry a maximum of 40 KB of metadata. Applications that need richer metadata per record must store supplementary data in an external database and join results at the application layer ^[21].
Limited index type control: Users cannot select or tune the underlying index algorithm. While this simplifies operations, it prevents optimization for specialized workloads where fine-grained control over HNSW parameters or alternative algorithms (such as DiskANN or FAISS-based IVF) would be beneficial.
Region availability: Serverless indexes are available only on specific cloud providers and regions, listed above. Organizations with data residency requirements in unsupported regions must use the BYOC Dedicated plan.
Dimension limits: Serverless indexes support up to 20,000 dimensions per vector, which covers most embedding models but may not accommodate very high-dimensional custom embeddings.
Backup limits: Serverless backups are available only on Standard and Enterprise plans, only for indexes with up to 2,000 namespaces, and have a quota of 50 backups per project. Backup or restore of a 100 million vector index can take up to 5 hours ^[40].
Migration constraints: The pod-to-serverless migration tool supports only indexes with fewer than 25 million records and 20,000 namespaces, and migration cannot cross clouds ^[15].
Cost at very large scale: For consistently high-utilization or very large multi-tenant workloads, per-unit serverless pricing can exceed object-storage-backed or self-hosted alternatives, a factor cited in customer migrations away from Pinecone such as Notion's move to Turbopuffer and Confident AI's move to pgvector ^[36]^[45].
No native full-text search until 2026: Unlike Weaviate or Milvus, the Pinecone core database lacked a built-in BM25 or full-text engine until the May 2026 release; before then, hybrid search required users to generate and manage sparse vectors externally ^[22].

How does Pinecone compare to other vector databases?

The vector database market has grown rapidly since 2022, driven by the adoption of large language models and retrieval-augmented generation patterns. Independent estimates put the global vector database market at roughly $3.2 billion in 2025 with annual growth around 24 percent. Pinecone competes with several categories of alternatives ^[41].

Competitor	Type	Key differentiator
Weaviate	Open-source (Go)	Native hybrid search, GraphQL API, self-hostable, modules for vectorization and reranking
Chroma	Open-source (Python)	Lightweight, pip-installable, ideal for prototyping RAG locally
Milvus / Zilliz	Open-source (Go/C++)	GPU acceleration, billions-scale, multiple index types (HNSW, IVF, ANNoy), Zilliz managed offering
Qdrant	Open-source (Rust)	High performance, Rust-native, advanced payload filtering, quantization, single-digit ms latency
pgvector	PostgreSQL extension	Adds vector search to existing Postgres databases, transactional consistency
FAISS	Library (C++/Python)	Meta AI research library, library not a database, used as building block by other systems
Vespa	Search engine (Java)	Hybrid retrieval, ranking, used at Yahoo and other large search workloads
OpenSearch	Search engine (Java)	Lucene-based with k-NN plugin, AWS-managed
Redis	In-memory database	Vector similarity in Redis Enterprise alongside existing key-value workloads

Pinecone's main advantage over open-source alternatives is operational simplicity: there is no infrastructure to manage, no index tuning required, and the service scales automatically. Its main disadvantage is vendor lock-in and cost at scale, since users cannot self-host the software outside the BYOC tier. Pinecone is also closed-source, which limits transparency into how the system works internally and makes independent benchmarking harder than for open-source alternatives ^[39]^[41].

In comparative benchmarks published in 2025 and 2026, Pinecone has typically returned p50 query latencies of roughly 10 to 50 milliseconds depending on workload size, broadly competitive with Qdrant and Milvus on managed clouds, and has compared favorably for multi-tenant workloads with very large numbers of namespaces. Self-hosted Qdrant and Milvus deployments often outperform Pinecone on raw single-query latency at the cost of higher operational burden ^[41]^[42].

security and compliance

Pinecone has obtained multiple security certifications relevant to enterprise and regulated workloads:

Certification or framework	Status
SOC 2 Type II	Annual audit completed; 2025 audit completed with zero deviations
ISO 27001:2022	Annual surveillance audits completed; certification active
GDPR	Compliant; supported by EU regions including eu-west-1 and eu-central-1
HIPAA	External attestation across AWS, Azure, and GCP; BAA available on request
PCI DSS	Compliant
FedRAMP	Compliant
CSA STAR Level 1	Compliant

Pinecone's Trust and Security Center, hosted on SafeBase, provides downloadable copies of these reports under NDA. The combination of certifications has made Pinecone usable in regulated industries including pharmaceuticals, banking, healthcare, and public-sector applications ^[43]^[44].

current state (2025 to 2026)

By mid-2026, Pinecone continues to operate as one of the most widely used managed vector databases with more than 9,000 organizations on the platform. The leadership transition to Ash Ashutosh as CEO in September 2025 signaled a shift toward expanding the company's enterprise sales and go-to-market capabilities, while Edo Liberty's move to Chief Scientist focused the company's technical direction on its AI research ambitions, particularly the Nexus knowledge-engine vision ^[6].

The company has expanded its product surface beyond pure vector storage. Pinecone Inference provides hosted embedding and reranking models, reducing the need for separate model-serving infrastructure. Pinecone Assistant abstracts away RAG pipeline construction entirely. Dedicated Read Nodes, launched in late 2025 and generally available in early 2026, address the needs of high-throughput production workloads that require predictable latency. The Nexus knowledge engine, KnowQL query language, Pinecone Marketplace, native full-text search, the Builder tier, and the Frankfurt cloud region all shipped during the May 2026 launch week, signaling a deeper move into agent-first knowledge infrastructure ^[13]^[14]^[29].

Reports of a potential acquisition during 2025, alongside competitive pressure and the departure of marquee customers such as Notion, suggest the company is weighing its options between remaining independent and joining a larger platform ^[6]^[45]. Regardless of the outcome, Pinecone's serverless architecture, integrated inference layer, and focus on developer experience have made it a reference point in the vector database category, and its publicly described slab and partitioning techniques have influenced product designs at competing managed and open-source projects ^[6]^[41].

references

Edo Liberty's homepage - Edo Liberty ↩
Founder Story: Edo Liberty of Pinecone - Frederick AI ↩
Pinecone drops $100M investment on $750M valuation - TechCrunch, April 27, 2023 ↩
Announcing the Pinecone Vector Database and $10M in Seed Funding - Pinecone Blog, January 27, 2021 ↩
Pinecone raises $28M to advance vector database - TechTarget, March 29, 2022 ↩
Exclusive: Pinecone founder Edo Liberty moves from CEO to Chief Scientist, names Googler Ash Ashutosh as leader - VentureBeat, September 2025 ↩
Moving Pinecone forward with Ash Ashutosh as CEO and Edo spearheading our growing AI ambitions as Chief Scientist - Pinecone Blog, September 8, 2025 ↩
Pinecone - Crunchbase Company Profile & Funding - Crunchbase ↩
Pinecone Customers - Pinecone ↩
Pinecone's vector database gets a new serverless architecture - TechCrunch, January 16, 2024 ↩
Introducing integrated inference: Embed, rerank, and retrieve your data with a single API - Pinecone Blog, December 2, 2024 ↩
Simplify, enhance, and evaluate RAG development with Pinecone Assistant, now in public preview - Pinecone Blog, September 18, 2024 ↩
Pinecone rolls out dedicated read nodes - Blocks and Files, December 2025 ↩
Pinecone Nexus: The Knowledge Engine for Agents - Pinecone Blog, May 4, 2026 ↩
Migrate a pod-based index to serverless - Pinecone Documentation ↩
Why we replaced Pinecone with PGVector - Confident AI ↩
Reimagining the vector database to enable knowledgeable AI - Pinecone Blog ↩
Inside Pinecone: Slab Architecture - Pinecone Learn ↩
Pinecone Dedicated Read Nodes are now in Public Preview - Pinecone Blog, December 1, 2025 ↩
Pinecone BYOC: Pinecone in your AWS, GCP, or Azure account, no vendor access - Pinecone Blog ↩
Filter by metadata - Pinecone Docs - Pinecone Documentation ↩
Launch Week: Pinecone for agents, search, recommendations, and more - Pinecone Blog, March 2025 ↩
Hybrid search - Pinecone Docs - Pinecone Documentation ↩
SPLADE for Sparse Vector Search Explained - Pinecone Learn ↩
New Bulk Data Operations: Update, Delete, and Fetch by Metadata - Pinecone Blog ↩
Pinecone Python SDK - Pinecone Documentation ↩
Introducing reranking to Pinecone Inference to simplify building accurate AI - Pinecone Blog ↩
Easily build knowledgeable chat and agent-based applications in minutes with Pinecone Assistant, now generally available - Pinecone Blog ↩
Pinecone Expands in Europe with New Frankfurt Cloud Region - PR Newswire, May 8, 2026 ↩
Pinecone reinvests hours saved from using Notion AI into building revolutionary AI products - Notion ↩
Revolutionizing Revenue Intelligence: Gong's Strategic Partnership with Pinecone - Pinecone ↩
Pinecone Vector Database: A Complete Guide - Airbyte ↩
Pinecone Pricing - Pinecone ↩
AWS Marketplace: Pinecone Vector Database - AWS Marketplace ↩
Pinecone vs. Weaviate Cost Comparison 2025 - Cloudatler ↩
The True Cost of Pinecone - MetaCTO ↩
Available cloud regions - Pinecone Documentation ↩
Building AI apps on Azure with Pinecone just got a lot easier - Pinecone Blog ↩
Top 5 Vector Databases for Enterprise RAG - Rahul Kolekar, 2026 ↩
Understanding backups - Pinecone Documentation ↩
Best Vector Databases 2026: Pinecone, Chroma, Qdrant & More - DataCamp, 2026 ↩
Vector Database Comparison 2026: Pinecone vs pgvector vs Chroma vs Weaviate - GroovyWeb, 2026 ↩
Trust and Security - Pinecone ↩
Pinecone is now HIPAA compliant - Pinecone Blog ↩
From shiny object to sober reality: The vector database story, two years later - VentureBeat, 2025 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

6 revisions by 1 contributors · full history

Suggest edit

Pinecone

What is Pinecone used for?

history and founding

Who is the CEO of Pinecone?

funding

product evolution

architecture

pod-based architecture

How does Pinecone Serverless work?

slab architecture and adaptive indexing

geometric partitioning

Dedicated Read Nodes

Bring Your Own Cloud

indexing algorithms

key features

real-time indexing

metadata filtering

namespaces

sparse-dense hybrid search

API usage

Pinecone Inference

Pinecone Assistant

What is Pinecone Nexus?

real-world use cases and customers

How much does Pinecone cost?

cost considerations at scale

cloud and region availability

integrations

limitations

How does Pinecone compare to other vector databases?

security and compliance

current state (2025 to 2026)

see also

references

Improve this article

What links here (24 of 47)

What links here (24 of 47)

What is Pinecone used for?

history and founding

Who is the CEO of Pinecone?

funding

product evolution

architecture

pod-based architecture

How does Pinecone Serverless work?

slab architecture and adaptive indexing

geometric partitioning

Dedicated Read Nodes

Bring Your Own Cloud

indexing algorithms

key features

real-time indexing

metadata filtering

namespaces

sparse-dense hybrid search

API usage

Pinecone Inference

Pinecone Assistant

What is Pinecone Nexus?

real-world use cases and customers

How much does Pinecone cost?

cost considerations at scale

cloud and region availability

integrations

limitations

How does Pinecone compare to other vector databases?

security and compliance

current state (2025 to 2026)

see also

references

Improve this article

Related Articles

CUDA

Cloud TPU

Data Parallelism

Machine learning terms/Google Cloud

Model hubs

Model Parallelism

What links here (24 of 47)

Related Articles

CUDA

Cloud TPU

Data Parallelism

Machine learning terms/Google Cloud

Model hubs

Model Parallelism

What links here (24 of 47)