Vespa (search engine)

23 min read

Updated Jul 23, 2026

Vespa is an open-source big-data serving engine that combines vector search, lexical search, and structured search inside a single query, with real-time indexing and machine-learned ranking executed on the data nodes.^[1]^[2] The technology originated inside Norwegian search company Fast Search & Transfer in the late 1990s, became a Yahoo internal platform after Yahoo acquired Overture (and through it Fast's web search assets) in 2003, and was released under the Apache 2.0 license in September 2017.^[3]^[4]^[5] In October 2023 Yahoo spun the project out as Vespa.ai, an independent company headquartered in Trondheim, Norway, with Jon Bratseth as chief executive officer.^[6]^[7] In November 2023 Vespa.ai raised a $31 million Series A round led by Blossom Capital to grow the engineering team and accelerate the managed Vespa Cloud product.^[8]^[9] As of 2026, Vespa powers retrieval at companies including Yahoo, Perplexity, Spotify, Vinted, and Wix, and in April 2025 Perplexity brought its search function in-house on Vespa while serving more than 15 million monthly users and over 100 million queries per week.^[7]^[10]^[11]^[32]

History

Origins at Fast Search & Transfer

The engine that became Vespa traces its lineage to Fast Search & Transfer (FAST), a Trondheim company founded in 1997 out of research at the Norwegian University of Science and Technology.^[12] FAST built the AlltheWeb crawler beginning in 1998 and launched the AlltheWeb.com web search engine in May 1999, which at its peak indexed billions of pages and competed directly with Google.^[12] FAST's distributed search architecture was the engineering ancestor of the indexing and serving system later renamed Vespa.^[4]^[12]

In February 2003 Overture Services acquired the AlltheWeb web search division from FAST for approximately $70 million, and in July 2003 Yahoo acquired Overture for $1.63 billion, bringing the AlltheWeb engineering group in Trondheim and Sunnyvale into Yahoo.^[12]^[3] The codename "Vespa" (an internal Yahoo project name for the new big-data serving engine) was applied as the team rewrote major portions of the search core for Yahoo's recommendation, advertising, and content-discovery use cases.^[3]^[4]

Yahoo as internal platform

By the mid-2010s Vespa was the production serving layer behind a large fraction of Yahoo's consumer-facing surfaces. The official open-sourcing announcement in September 2017 listed Yahoo.com, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Gemini, and Flickr as products powered by the engine, processing billions of daily requests over billions of documents and serving content and ads roughly 90,000 times per second across all Yahoo properties.^[4] Yahoo Gemini, the native-ads platform, alone served over three billion native ad requests per day on Vespa.^[4]

Distinguished architect Jon Bratseth led the team during this period. In a 2017 SlideShare deck he described his role as "Distinguished Architect, Oath" (Verizon's renamed Yahoo holding company at the time) and characterized Vespa as Yahoo's "big data serving" platform, distinguishing it from the offline-batch focus of Hadoop.^[13]

Open-source release, September 2017

Yahoo released Vespa under the Apache 2.0 license on September 27, 2017, publishing the source to the github.com/vespa-engine/vespa repository.^[4]^[14] The blog post framed the release as the largest open-source contribution from Yahoo since the company had open-sourced Hadoop in 2006: "By releasing Vespa, we are making it easy for anyone to build applications that can compute responses to user requests, over large datasets, at real time and at internet scale."^[4] Press coverage at the time, including CNBC and SiliconANGLE, described Vespa as a substantial rewrite of the Yahoo serving stack and as Yahoo's most significant open-source software release in over a decade.^[3]^[15] The Trondheim team remained the primary development group after the release.^[3]^[15]

Vespa began serving external customers (i.e., non-Yahoo workloads) in 2021 with the launch of the managed Vespa Cloud product, a multi-tenant hosted version of the engine running on AWS and GCP.^[7]^[16]

Independent company, October 2023

On October 4, 2023, Yahoo announced that Vespa would be spun out as a standalone company called Vespa.ai, with Jon Bratseth appointed chief executive officer.^[6]^[7] Yahoo retained an equity stake, kept a seat on the new company's board of directors, and committed to remaining Vespa's largest customer.^[7]^[17] At spin-out Yahoo was running roughly 150 applications on Vespa, collectively serving close to a billion users and processing about 800,000 queries per second across the company's properties.^[6]^[17] The official press release cited Spotify and Wix as existing external customers alongside Yahoo.^[6]

Yahoo chief executive Jim Lanzone said the spin-out would let Yahoo "create a new business opportunity that allows other companies to harness its technology as an independent entity" while Yahoo continued to use and invest in the platform.^[6] Bratseth, who had been a vice-president architect in Yahoo's big-data and AI group, stated that the time was right "to spin out Vespa and allow other companies to take advantage of Vespa Cloud in a meaningful way."^[6]

Series A funding, November 2023

On November 1, 2023, Vespa.ai announced a $31 million Series A financing led by London-based Blossom Capital.^[8]^[9] At the time of the announcement Bratseth said the proceeds would fund growth of the standalone business, strengthening of the engineering function, and faster delivery of features for users combining AI models with proprietary data sets.^[8]^[9] The TechCrunch coverage noted the company had 29 employees post-spin-out and planned to use the funding to convert open-source users into paying Vespa Cloud customers.^[9] DLA Piper, which advised Blossom on the deal, characterized it as a Norwegian Series A round.^[18]

Growth as an independent company, 2024 to 2026

After the Series A, Vespa.ai expanded the managed Vespa Cloud business and shipped features aimed at retrieval-augmented generation (RAG) and AI agents.^[35]^[36] On April 15, 2025, Perplexity announced it had brought its search function in-house on Vespa, citing Vespa's integration of retrieval, ranking, and machine-learning inference at scale; at the time Perplexity reported more than 15 million monthly users and over 100 million queries per week.^[32] Perplexity founder and chief executive Aravind Srinivas summarized the strategy as "The recipe: 1. Solve Search. 2. Use it to solve everything else."^[32]

Vespa.ai's 2025 development was organized around three themes, performance, retrieval quality, and operational simplicity, and added integrated chunking, chunk-level scoring, and layered ranking for RAG pipelines, support for newer BERT embedding models, and core-speed gains including lexical queries up to three times faster.^[35]^[36] The company also extended Vespa Cloud to Microsoft Azure zones alongside its existing AWS and Google Cloud footprint.^[35]

In early 2026 Vespa.ai listed Vespa Cloud on the Google Cloud Marketplace and released Pyvespa 1.0, whose new HTTP client Vespa.ai measured as about 4.9 times faster than the prior version when returning 400 hits with a 768-dimension vector each.^[37] In February 2026 the company announced a strategic partnership with data-integration company Nexla, whose Vespa Connector pipes data from sources such as Amazon S3, PostgreSQL, and Snowflake into Vespa and whose plugin CLI auto-generates draft Vespa application packages.^[37]^[39] Nexla chief executive Saket Saurabh said "data integration and intelligent retrieval are two sides of the same coin in modern AI architectures," while Bratseth said "Vespa is built for teams that need precision, performance, and real-time execution at scale."^[39] In May 2026 Vespa.ai announced its first in-person community event, Vespa.ai Live, scheduled for September 2026 in London with speakers from Walmart, Etsy, and RavenPack.^[38]

Architecture

Vespa applications run on three logical cluster types, defined together in an application package that is deployed to a Vespa zone.^[2]^[19]

Container layer

Container clusters are stateless. They terminate HTTP requests, run the query and document-feeding APIs, host user-supplied Java components, and merge results returned by the content layer.^[2]^[19] Each container node runs the jDisc framework, a Java service-container that hosts request handlers, processors, and searchers.^[2] Application owners can plug in custom Java components in the container, including query rewriters, document processors, document enrichers, and federated searchers.^[2]^[19]

Content layer

Content clusters hold the data and execute the per-document work of a query: matching, ranking, grouping, and aggregation.^[2]^[20] A content node is responsible for some subset of the documents in the cluster according to a distribution algorithm, and the cluster as a whole automatically rebalances data when nodes are added, removed, or fail.^[2] Each document is replicated across multiple content nodes to provide redundancy and to keep serving available during node failures.^[2]^[20]

The content nodes also hold the indexes used to evaluate queries: inverted indexes for text and tagged attribute fields for structured filtering, plus HNSW graphs over tensor fields used for approximate nearest-neighbor search.^[21]^[22] Queries are scatter-gathered: each content node performs matching, ranking, and selection of its own top-k candidates, and the container then merges the per-node results into a final response.^[2]^[20]

Configuration layer

A small admin cluster runs the Vespa configuration servers, which validate application packages, derive per-node configuration from the deployed application, and coordinate schema and topology changes without service interruption.^[2]^[19] The control plane resources are not billed in Vespa Cloud and are managed automatically by the platform.^[16]

Application package

The unit of deployment in Vespa is the application package: a directory tree that includes a services.xml file declaring container and content clusters, one or more .sd schema definition files per document type, optional Java components, ranking expressions, ONNX model files, and other configuration.^[19]^[23] Deploying the same application package against a single-node test cluster or a hundred-node production cluster uses the same workflow; the configuration servers translate the package into the per-node configuration each process needs.^[19]^[23]

Document and query APIs

Vespa exposes two main HTTP APIs.^[2]^[19] The document API supports put, update, and remove operations on a per-document basis; writes are durable and become visible to queries within milliseconds.^[2]^[19] The query API accepts requests in YQL (the Vespa Query Language) or in a more compact JSON form, supports complex Boolean and ranking compositions of operators, and returns scored hits assembled from all content clusters.^[2]^[19]

Features

Real-time indexing

Vespa supports full-CRUD indexing: documents written via the document API are searchable within milliseconds, and partial updates to attribute fields are applied in place without a re-index.^[2]^[22] The HNSW graphs used for vector search are constructed online as documents are fed rather than pre-built offline, allowing the index to stay current while accepting reads.^[22] Multithreaded distance calculations during graph updates reduce indexing latency for high-write workloads.^[22]

Approximate nearest-neighbor search

Vespa implements a modified variant of the Hierarchical Navigable Small World algorithm of Malkov and Yashunin (2016) for approximate nearest-neighbor search over tensor fields.^[22] The implementation maintains a single graph per tensor field per content node, supports multiple indexed tensor fields concurrently, and supports the hnsw index over multi-vector documents.^[22] Approximate nearest-neighbor queries can be combined with arbitrary filters using either pre-filtering (apply filters before traversing the graph) or post-filtering (apply filters after retrieval), with the strategy configurable per query.^[22] The greedy graph traversal is sublinear in document count, doubling per-node corpus size adds roughly fifty percent latency in practice.^[22] Memory footprint is tunable via cell type: switching from float to bfloat16 approximately halves memory consumption with minimal accuracy loss.^[22]

Lexical search and BM25

Vespa supports full-text search via inverted indexes with the bm25 scoring function as a built-in rank feature, plus the weakAnd operator for accelerated linguistic-text queries and the wand operator for learned sparse representations such as splade.^[21]^[24] BM25 features integrate into the same rank profile as vector and tensor features, allowing developers to compose hybrid scoring expressions directly inside the engine.^[21]^[24]

Hybrid search

Vespa's signature capability is hybrid retrieval inside a single query and a single system rather than across federated services.^[21]^[24] Common patterns include disjunctive retrieval (issue a nearestNeighbor operator and a userQuery operator in OR and expose all candidates to ranking), and the RANK operator (retrieve by one method while computing additional features from another for use only at ranking time).^[24] The official "Redefining Hybrid Search" engineering blog notes that adding HNSW indexes to dense vector fields that are only used in ranking phases is wasteful, and recommends selective indexing strategies where retrieval uses sparse signals while denser embeddings contribute only to scoring.^[24]

Tensor framework

Vespa stores and computes on tensors of arbitrary order, not just one-dimensional vectors, with mixed mapped and indexed dimensions.^[25]^[26] Tensor operations including reduce, join, map, and dot product can be composed in ranking expressions, and the same tensor algebra is used for both vector similarity and richer multi-dimensional scoring.^[25]^[26] This is the substrate for advanced retrieval paradigms such as ColPali, where each document is represented as a tensor of patch embeddings.^[27]

Multi-phase ranking

Vespa ranking is organized as a pipeline of phases declared in a rank profile.^[28]^[29] The first-phase function runs on every matching document on the content node and selects per-node top candidates; the second-phase function runs locally on the content node on the smaller set returned by the first phase; and the global-phase function runs on the container after the per-node candidates are merged, enabling expensive cross-encoder models to run on the final shortlist.^[28]^[29] Each phase can call out to ONNX-format neural network inferences as native rank features.^[28]^[29]

Layered ranking and chunk-level retrieval

Vespa 8.520 added integrated chunking during indexing, letting an application split documents into sentence, fixed-length, or custom chunks inside the indexing pipeline rather than in a separate preprocessing step.^[35] Building on this, the layered ranking framework introduced in 2025 lets a single ranking function both score and select the top N documents and then score the top M chunks within each of those documents, so a RAG pipeline can pass a language model only the most relevant spans of each hit.^[35]^[36] The elementwise BM25 rank feature (elementwise(bm25(field), dimension, cell_type)) computes per-element BM25 scores for multi-valued indexed string fields, exposing chunk-level lexical signals to ranking as a tensor.^[35] Vespa.ai frames chunk-level scoring, integrated chunking, and full transparency in the global phase as its 2025 push to make RAG pipelines more accurate and easier to reason about.^[36]

ONNX model serving

Vespa supports ranking models in onnx, xgboost, and LightGBM formats embedded directly in the application package and evaluated inside the serving processes on the content or container nodes.^[28]^[29] Because ONNX export is supported by tensorflow, PyTorch, and scikit-learn, a wide range of trained models can be deployed as Vespa rank features without an external inference service.^[28]^[29]

Multi-vector retrieval

Vespa supports storing multiple embeddings per document and indexing them in a single HNSW graph per field, which is the structure required by late-interaction models such as colbert and the vision-language ColPali model.^[22]^[27] The MaxSim operator native to Vespa's tensor framework computes the late-interaction score by taking the dot product between each query token embedding and each document patch embedding, then reducing across patches with a max operation and summing across query tokens.^[27] A Vespa engineering blog post describes scaling ColPali-style retrieval to billions of documents using a two-stage pipeline (HNSW retrieval on a single representative vector per document followed by MaxSim ranking on the multi-vector tensor), with binary quantization shrinking 128-dimensional float patches to 128-bit codes for roughly thirty-two-times storage reduction at about ninety-nine percent of the float-precision nDCG@5.^[27]

How does Vespa support RAG and AI agents?

Vespa is widely used as the retrieval layer for retrieval-augmented generation and, increasingly, for AI agents that issue many retrieval calls per task.^[33]^[40] Vespa.ai publishes a reference architecture called the RAG Blueprint that it describes as "Vespa's reference architecture for building production-ready AI retrieval applications," integrating hybrid retrieval, multi-phase ranking, real-time indexing, and machine-learning inference into one modular implementation.^[40] Because retrieval, ranking, filtering, and model inference all run inside a single distributed engine, a RAG or agent workflow can complete "dozens or even hundreds of retrieval operations" without shuttling data between separate vector, keyword, and reranking services.^[33]

For agentic workloads Vespa.ai positions the engine as a unified retrieval backend that delivers the "<100 ms latency required for large-scale AI retrieval," keeps information fresh through continuous indexing while serving live traffic, and applies expensive machine-learning models only in later ranking phases.^[33] Metal AI, an institutional-intelligence platform for private-equity firms, built its retrieval layer on Vespa Cloud to model documents, people, activities, and financial data as related entities; co-founder and chief technology officer Sergio Prada reported that "95% of our retrieval is done by AI agents" and that "our competitors focus on documents. With Vespa, we can focus on the full picture."^[34]

On chunking strategy Vespa.ai notes a tradeoff: "Chunking is one approach to fitting documents within an LLM's context window, but it introduces trade-offs," and the platform also supports multi-vector retrieval and parent-child document retrieval as alternatives.^[40] Integrated chunking and layered ranking (see Features) keep both the chunk boundaries and the ranking that selects among chunks inside the same system.^[35]^[36]

How does Vespa compare to other vector databases?

Vespa predates the modern "vector database" category by roughly two decades; the engineering core matured inside Yahoo from 2003 onward, well before the wave of dedicated vector-database startups in 2017 to 2023.^[3]^[4]^[30] The closest engineering peer is the faiss library combined with a serving layer, but Faiss is a search algorithm, not a serving system. Among full-stack serving products the most-cited alternatives are listed below.

System	Origin	License	Vector	Lexical	Hybrid	Real-time CRUD	Tensors
Vespa	Yahoo / Vespa.ai, 2003 / 2017 OSS	Apache 2.0	Yes (HNSW)	Yes (BM25)	Yes (single query)	Yes	Yes
pinecone	Pinecone Systems, 2019	Proprietary SaaS	Yes	Limited	Limited	Yes	No
weaviate	SeMI / Weaviate B.V., 2019	BSD-3	Yes (HNSW)	Yes (BM25)	Yes	Yes	No
qdrant	Qdrant, 2021	Apache 2.0	Yes (HNSW)	Yes (BM25-like)	Yes	Yes	No
milvus	Zilliz, 2019	Apache 2.0	Yes (multiple)	Limited	Limited	Yes	No
chroma	Chroma, 2022	Apache 2.0	Yes	Limited	Limited	Yes	No
lancedb	LanceDB, 2022	Apache 2.0	Yes (IVF-PQ)	Limited	Partial	Yes	No
pgvector	Crunchy Data / Andrew Kane, 2021	PostgreSQL	Yes (HNSW, IVFFlat)	Via PostgreSQL FTS	Partial	Yes	No

Public benchmarks published by Vespa.ai in January 2025 compared Vespa 8.427.7 to Elasticsearch 8.15.2 on a one-million-product e-commerce dataset and reported per-CPU-core throughput advantages of up to 12.9 times for vector queries, 8.5 times for hybrid queries, and 6.5 times for lexical queries, plus approximately four times higher efficiency for in-place updates.^[30] Vespa.ai stated that the throughput differences translated into infrastructure-cost reductions of up to five times in equivalent workloads, although such vendor benchmarks should be read alongside independent replications.^[30] The fashion-resale marketplace Vinted reported halving its server count, improving search latency by 2.5 times, and reducing indexing latency by three times after migrating from a prior search stack to Vespa.^[30]

Architecturally, Vespa differs from Elasticsearch and hnsw-only services in that retrieval and ranking are unified in one process: ranking models and inverted-index lookups run together on the content node, so an entire multi-phase scoring pipeline can complete without leaving the cluster.^[21]^[28]

Who uses Vespa?

Yahoo

Yahoo operates roughly 150 internal Vespa applications collectively serving close to a billion users at around 800,000 queries per second across properties including Yahoo.com, Yahoo News, Yahoo Sports, Yahoo Finance, and the Yahoo (formerly Gemini) advertising network.^[4]^[6]^[17] Yahoo Gemini alone was serving more than three billion native ad requests per day as of the 2017 open-source announcement.^[4]

Perplexity

Perplexity Inc., the company behind the Perplexity answer engine, brought its search function in-house on Vespa in a partnership announced on April 15, 2025, choosing Vespa for its integration of hybrid sparse-and-dense retrieval, chunk-level passage selection, and machine-learning ranking inside one serving layer; at announcement it reported more than 15 million monthly users and over 100 million queries per week.^[10]^[32] Vespa.ai's customer page reports that by May 2025 Perplexity served 22 million active users and 780 million monthly queries on Vespa-based infrastructure, with sub-second latency at thousands of concurrent queries per second.^[10]

Spotify

Spotify uses Vespa across several product surfaces, including semantic search over podcasts using dense embeddings alongside lexical retrieval, as described on Vespa.ai's case-studies page.^[11] Spotify appears in the official Yahoo press release as an existing external customer at the time of the October 2023 spin-out.^[6]

Wix

The website-builder company Wix uses Vespa for search across its hosted-sites and apps product surface, and was listed in both the spin-out press release and the Series A coverage as an external customer.^[6]^[9]

Vinted

The European secondhand-clothing marketplace Vinted migrated to Vespa for product search and personalization. Vespa.ai's case study reports that Vinted's migration delivered a 1.1 percent uplift in transactions and more than 3.5 million euros in additional gross merchandise value, alongside operational gains including a halving of server count.^[11]^[30]

Metal AI

Metal AI operates an institutional-intelligence platform for private-equity firms, using Vespa Cloud as its retrieval layer to unify deal documents, expert-call transcripts, financial statements, and CRM records into a single entity-aware index.^[34] The company automates due-diligence questionnaires by ranking pre-approved answers on both semantic similarity and recency so that only legally reviewed content surfaces, and reports that 95 percent of its retrieval is performed by AI agents.^[34]

Other users

Vespa.ai also lists Elicit, Onyx, RavenPack, Qwant, Clarm, and Mimeta-Civsy as production users across research search, enterprise search, financial research, privacy-focused web search, and specialized analytical retrieval.^[11]

What is Vespa Cloud?

Vespa Cloud is the managed service operated by Vespa.ai, launched in 2021 and made widely available to external customers at the same time.^[7]^[16] It runs Vespa application packages on AWS and Google Cloud, and from 2025 on Microsoft Azure zones, across multiple regions and provides automatic data-plane upgrades, autoscaling of container and content clusters, and an enclave deployment option in which the customer's own AWS or GCP account hosts the data while Vespa.ai operates the software.^[16]^[35]

Pricing is consumption based: customers pay for the container and content clusters they provision, with the control-plane services included at no extra charge.^[16] Published prices start at approximately ten cents per gigabyte per month for storage and compute, and unit prices decrease linearly with the total resources allocated to an application up to a fifty percent discount (or up to eighty-three percent in the enclave configuration).^[16] Autoscaling is applied to all clusters and Vespa.ai documents typical cost reductions of around fifty percent on bursty workloads from running scaled-to-need rather than provisioned to peak.^[16] Quota is expressed in dollars per hour and is computed against the maximum possible cost of a configured application.^[16]

In 2026 Vespa.ai listed Vespa Cloud on the Google Cloud Marketplace and added automated snapshot backups of content-node indexes, enabling disaster recovery without a full re-index; it also released a Vespa Kubernetes Operator for deploying Vespa on Kubernetes with dynamic provisioning, autoscaling, and automated upgrades, and added managed embedding integrations for Voyage AI, OpenAI, and Mistral AI.^[37]^[38]

Funding

Vespa.ai's $31 million Series A round closed on November 1, 2023, led by Blossom Capital, the London-based Series A specialist.^[8]^[9] The round was the first significant external capitalization of Vespa.ai after the Yahoo spin-out, which had occurred about a month earlier.^[7]^[8] Yahoo retained an equity stake and a board seat from the spin-out and was not listed as a Series A investor by Vespa.ai's announcement, but remained the largest customer of the platform.^[6]^[9] DLA Piper's Norwegian office advised Blossom on the transaction and characterized it as a Series A investment into the Norwegian company Vespa.ai AS.^[18]

The company stated that the proceeds would be used to grow the standalone business, expand the engineering function, accelerate feature delivery, and develop the Vespa Cloud commercial offering.^[8]^[9] As of the TechCrunch coverage in November 2023 the company employed 29 people.^[9]

Leadership

The founders listed on the Vespa.ai company page are:^[31]

Jon Bratseth, founder and chief executive officer, previously a distinguished architect and vice-president in Yahoo's big-data and AI group, with more than twenty years' experience on Vespa and its predecessors.^[31]^[13]
Kim O. Johansen, founder and chief operating officer, with more than twenty years managing large distributed-system development.^[31]
Frode Lundgren, founder and chief technology officer, with more than twenty years managing teams operating large-scale Vespa applications.^[31]
Kristian Aune, founder and head of customer success, with more than twenty years' experience working with Vespa stakeholders.^[31]
Tim Young, chief marketing officer, with more than twenty-five years marketing AI and analytics technology.^[31]

Vespa.ai's headquarters and engineering hub is in Trondheim, Norway, with employees distributed globally.^[31]

References

^Vespa.ai, "AI Search Platform", vespa.ai, 2026. vespa.ai Accessed 2026-05-26.
^Vespa, "Vespa Overview", Vespa Documentation, 2026. docs.vespa.ai/...overview.html. Accessed 2026-05-26.
^Jordan Novet, "Yahoo open-sources Vespa, used for content recommendations, ad serving", CNBC, 2017-09-26. cnbc.com/...espa-for-content-recommendations.html. Accessed 2026-05-26.
^Jon Bratseth, "Open Sourcing Vespa, Yahoo's Big Data Processing and Serving Engine", Vespa Blog, 2017-09-27. blog.vespa.ai/...-vespa-yahoos-big-data-processing Accessed 2026-05-26.
^Vespa, "LICENSE", vespa-engine/vespa GitHub repository, 2017. github.com/...LICENSE. Accessed 2026-05-26.
^Yahoo Inc., "Yahoo Spins Out Vespa, Its Enterprise AI-Scaling Engine, as an Independent Company", Yahoo press release, 2023-10-04. yahooinc.com/...-engine-as-an-independent-company. Accessed 2026-05-26.
^Jon Bratseth, "Vespa is becoming its own company", Vespa Blog, 2023-10-03. blog.vespa.ai/vespa-is-becoming-its-own-company Accessed 2026-05-26.
^Jon Bratseth, "Announcing our series A funding", Vespa Blog, 2023-11-01. blog.vespa.ai/announcing-our-series-a-funding Accessed 2026-05-26.
^Kyle Wiggers, "Yahoo spin-out Vespa lands $31M investment from Blossom", TechCrunch, 2023-11-01. techcrunch.com/...ands-31m-investment-from-blossom Accessed 2026-05-26.
^Vespa.ai, "How Perplexity uses Vespa.ai to power fast, accurate, and trusted answers for millions of users", vespa.ai, 2025. vespa.ai/perplexity Accessed 2026-05-26.
^Vespa.ai, "Case studies", vespa.ai, 2026. vespa.ai/case-studies Accessed 2026-05-26.
^Web Search Workshop, "A brief history of FAST/AlltheWeb", websearchworkshop.com.au, 2003. websearchworkshop.com.au/fast-history.php. Accessed 2026-05-26.
^Jon Bratseth, "Big Data Serving with Vespa", SlideShare presentation, Yahoo Developer Network, 2017. slideshare.net/...th-distinguished-architect-oath. Accessed 2026-05-26.
^Vespa, "vespa-engine/vespa", GitHub repository, 2026. github.com/...vespa. Accessed 2026-05-26.
^Maria Deutscher, "Yahoo open-sources Vespa, its most important software release since Hadoop", SiliconANGLE, 2017-09-26. siliconangle.com/...-software-release-since-hadoop Accessed 2026-05-26.
^Vespa, "Vespa Cloud Pricing", Vespa Cloud Documentation, 2026. cloud.vespa.ai/pricing. Accessed 2026-05-26.
^Kyle Wiggers, "Yahoo spins out Vespa, its search tech, into an independent company", TechCrunch, 2023-10-04. techcrunch.com/...tech-into-an-independent-company Accessed 2026-05-26.
^DLA Piper, "DLA Piper advised Blossom Capital on its $31 million Series A investment in Vespa.ai", DLA Piper Norway, 2023-11. norway.dlapiper.com/...-series-investment-vespaai. Accessed 2026-05-26.
^Vespa, "Application Packages", Vespa Documentation, 2026. docs.vespa.ai/...application-packages.html. Accessed 2026-05-26.
^Vespa, "Vespa Serving Scaling Guide", Vespa Documentation, 2026. docs.vespa.ai/...sizing-search.html. Accessed 2026-05-26.
^Vespa, "Nearest Neighbor Search", Vespa Documentation, 2026. docs.vespa.ai/...nearest-neighbor-search.html. Accessed 2026-05-26.
^Vespa, "Approximate nearest neighbor search using HNSW index", Vespa Documentation, 2026. docs.vespa.ai/...approximate-nn-hnsw.html. Accessed 2026-05-26.
^Vespa, "services.xml reference", Vespa Documentation, 2026. docs.vespa.ai/...services.html. Accessed 2026-05-26.
^Jo Kristian Bergum, "Redefining Hybrid Search Possibilities with Vespa", Vespa Blog, 2024. blog.vespa.ai/...d-search-possibilities-with-vespa Accessed 2026-05-26.
^Vespa, "Tensor Guide", Vespa Documentation, 2026. docs.vespa.ai/...tensor-user-guide.html. Accessed 2026-05-26.
^Vespa, "Tensor Reference", Vespa Documentation, 2026. docs.vespa.ai/...tensor.html. Accessed 2026-05-26.
^Jo Kristian Bergum, "Scaling ColPali to billions of PDFs with Vespa", Vespa Blog, 2024. blog.vespa.ai/scaling-colpali-to-billions Accessed 2026-05-26.
^Vespa, "Ranking", Vespa Documentation, 2026. docs.vespa.ai/...ranking.html. Accessed 2026-05-26.
^Vespa, "Phased Ranking", Vespa Documentation, 2026. docs.vespa.ai/...phased-ranking.html. Accessed 2026-05-26.
^Vespa.ai, "A Benchmark for Modernizing Elasticsearch with Vespa", Vespa Blog, 2025-01-16. blog.vespa.ai/modernizing-elasticsearch-with-vespa Accessed 2026-05-26.
^Vespa.ai, "Company", vespa.ai, 2026. vespa.ai/company Accessed 2026-05-26.
^Vespa.ai, "Perplexity Partners With Vespa.ai to Bring its Search Function In-House", vespa.ai (also distributed via Business Wire), 2025-04-15. vespa.ai/...-to-bring-its-search-function-in-house Accessed 2026-07-08.
^Vespa.ai, "Retrieval for AI Agents", vespa.ai, 2026. vespa.ai/...ai-agents Accessed 2026-07-08.
^Vespa.ai, "How Metal AI Built an Agent-Driven Intelligence Platform on Vespa Cloud", Vespa Blog, 2026-03-10. blog.vespa.ai/...riven-intelligence-on-vespa-cloud Accessed 2026-07-08.
^Vespa.ai, "Vespa Newsletter, June 2025", Vespa Blog, 2025-06. blog.vespa.ai/vespa-newsletter-june-2025 Accessed 2026-07-08.
^Vespa.ai, "Vespa Now: Year-in-Review", vespa.ai, 2026. vespa.ai/...vespa-now-year-in-review Accessed 2026-07-08.
^Bonnie Chase, "Vespa Newsletter, February 2026", Vespa Blog, 2026-02-16. blog.vespa.ai/vespa-newsletter-february-2026 Accessed 2026-07-08.
^Bonnie Chase, "Vespa Newsletter, May 2026", Vespa Blog, 2026-05-27. blog.vespa.ai/vespa-newsletter-may-2026 Accessed 2026-07-08.
^Nexla, "Nexla and Vespa.ai Partner to Simplify Real-Time AI Search Across Hundreds of Enterprise Data Sources", vespa.ai (distributed via GlobeNewswire), 2026-02-18. vespa.ai/nexla-and-vespa-ai-partner Accessed 2026-07-08.
^Vespa.ai, "Retrieval-Augmented Generation", vespa.ai, 2026. vespa.ai/...retrieval-augmented-generation Accessed 2026-07-08.

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

3 revisions by 1 contributors · v4 · 4,680 words · full history

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Suggest edit

What links here

BM25 (Okapi BM25)ColBERT Embedding Space FAISS HNSW Hybrid search SPLADE Vector database

History

Origins at Fast Search & Transfer

Yahoo as internal platform

Open-source release, September 2017

Independent company, October 2023

Series A funding, November 2023

Growth as an independent company, 2024 to 2026

Architecture

Container layer

Content layer

Configuration layer

Application package

Document and query APIs

Features

Real-time indexing

Approximate nearest-neighbor search

Lexical search and BM25

Hybrid search

Tensor framework

Multi-phase ranking

Layered ranking and chunk-level retrieval

ONNX model serving

Multi-vector retrieval

How does Vespa support RAG and AI agents?

How does Vespa compare to other vector databases?

Who uses Vespa?

Yahoo

Perplexity

Spotify

Wix

Vinted

Metal AI

Other users

What is Vespa Cloud?

Funding

Leadership

See also

References

Improve this article

Related Articles

Glean (company)

Hebbia

IBM watsonx

LlamaIndex

Haystack (framework)

FAISS

What links here

Related Articles

Glean (company)

Hebbia

IBM watsonx

LlamaIndex

Haystack (framework)

FAISS

What links here