Vespa (search engine)
Last reviewed
May 25, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 3,635 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 25, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 3,635 words
Add missing citations, update stale details, or suggest a clearer explanation.
Vespa is an open-source big-data serving engine that combines vector search, lexical search, and structured search inside a single query, with real-time indexing and machine-learned ranking executed on the data nodes.[1][2] The technology originated inside Norwegian search company Fast Search & Transfer in the late 1990s, became a Yahoo internal platform after Yahoo acquired Overture (and through it Fast's web search assets) in 2003, and was released under the Apache 2.0 license in September 2017.[3][4][5] In October 2023 Yahoo spun the project out as Vespa.ai, an independent company headquartered in Trondheim, Norway, with Jon Bratseth as chief executive officer.[6][7] In November 2023 Vespa.ai raised a $31 million Series A round led by Blossom Capital to grow the engineering team and accelerate the managed Vespa Cloud product.[8][9] As of 2026, Vespa powers retrieval at companies including Yahoo, Perplexity, Spotify, Vinted, and Wix.[7][10][11]
The engine that became Vespa traces its lineage to Fast Search & Transfer (FAST), a Trondheim company founded in 1997 out of research at the Norwegian University of Science and Technology.[12] FAST built the AlltheWeb crawler beginning in 1998 and launched the AlltheWeb.com web search engine in May 1999, which at its peak indexed billions of pages and competed directly with Google.[12] FAST's distributed search architecture was the engineering ancestor of the indexing and serving system later renamed Vespa.[4][12]
In February 2003 Overture Services acquired the AlltheWeb web search division from FAST for approximately $70 million, and in July 2003 Yahoo acquired Overture for $1.63 billion, bringing the AlltheWeb engineering group in Trondheim and Sunnyvale into Yahoo.[12][3] The codename "Vespa" (an internal Yahoo project name for the new big-data serving engine) was applied as the team rewrote major portions of the search core for Yahoo's recommendation, advertising, and content-discovery use cases.[3][4]
By the mid-2010s Vespa was the production serving layer behind a large fraction of Yahoo's consumer-facing surfaces. The official open-sourcing announcement in September 2017 listed Yahoo.com, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Gemini, and Flickr as products powered by the engine, processing billions of daily requests over billions of documents and serving content and ads roughly 90,000 times per second across all Yahoo properties.[4] Yahoo Gemini, the native-ads platform, alone served over three billion native ad requests per day on Vespa.[4]
Distinguished architect Jon Bratseth led the team during this period. In a 2017 SlideShare deck he described his role as "Distinguished Architect, Oath" (Verizon's renamed Yahoo holding company at the time) and characterized Vespa as Yahoo's "big data serving" platform, distinguishing it from the offline-batch focus of Hadoop.[13]
Yahoo released Vespa under the Apache 2.0 license on September 27, 2017, publishing the source to the github.com/vespa-engine/vespa repository.[4][14] The blog post framed the release as the largest open-source contribution from Yahoo since the company had open-sourced Hadoop in 2006: "By releasing Vespa, we are making it easy for anyone to build applications that can compute responses to user requests, over large datasets, at real time and at internet scale."[4] Press coverage at the time, including CNBC and SiliconANGLE, described Vespa as a substantial rewrite of the Yahoo serving stack and as Yahoo's most significant open-source software release in over a decade.[3][15] The Trondheim team remained the primary development group after the release.[3][15]
Vespa began serving external customers (i.e., non-Yahoo workloads) in 2021 with the launch of the managed Vespa Cloud product, a multi-tenant hosted version of the engine running on AWS and GCP.[7][16]
On October 4, 2023, Yahoo announced that Vespa would be spun out as a standalone company called Vespa.ai, with Jon Bratseth appointed chief executive officer.[6][7] Yahoo retained an equity stake, kept a seat on the new company's board of directors, and committed to remaining Vespa's largest customer.[7][17] At spin-out Yahoo was running roughly 150 applications on Vespa, collectively serving close to a billion users and processing about 800,000 queries per second across the company's properties.[6][17] The official press release cited Spotify and Wix as existing external customers alongside Yahoo.[6]
Yahoo chief executive Jim Lanzone said the spin-out would let Yahoo "create a new business opportunity that allows other companies to harness its technology as an independent entity" while Yahoo continued to use and invest in the platform.[6] Bratseth, who had been a vice-president architect in Yahoo's big-data and AI group, stated that the time was right "to spin out Vespa and allow other companies to take advantage of Vespa Cloud in a meaningful way."[6]
On November 1, 2023, Vespa.ai announced a $31 million Series A financing led by London-based Blossom Capital.[8][9] At the time of the announcement Bratseth said the proceeds would fund growth of the standalone business, strengthening of the engineering function, and faster delivery of features for users combining AI models with proprietary data sets.[8][9] The TechCrunch coverage noted the company had 29 employees post-spin-out and planned to use the funding to convert open-source users into paying Vespa Cloud customers.[9] DLA Piper, which advised Blossom on the deal, characterized it as a Norwegian Series A round.[18]
Vespa applications run on three logical cluster types, defined together in an application package that is deployed to a Vespa zone.[2][19]
Container clusters are stateless. They terminate HTTP requests, run the query and document-feeding APIs, host user-supplied Java components, and merge results returned by the content layer.[2][19] Each container node runs the jDisc framework, a Java service-container that hosts request handlers, processors, and searchers.[2] Application owners can plug in custom Java components in the container, including query rewriters, document processors, document enrichers, and federated searchers.[2][19]
Content clusters hold the data and execute the per-document work of a query: matching, ranking, grouping, and aggregation.[2][20] A content node is responsible for some subset of the documents in the cluster according to a distribution algorithm, and the cluster as a whole automatically rebalances data when nodes are added, removed, or fail.[2] Each document is replicated across multiple content nodes to provide redundancy and to keep serving available during node failures.[2][20]
The content nodes also hold the indexes used to evaluate queries: inverted indexes for text and tagged attribute fields for structured filtering, plus HNSW graphs over tensor fields used for approximate nearest-neighbor search.[21][22] Queries are scatter-gathered: each content node performs matching, ranking, and selection of its own top-k candidates, and the container then merges the per-node results into a final response.[2][20]
A small admin cluster runs the Vespa configuration servers, which validate application packages, derive per-node configuration from the deployed application, and coordinate schema and topology changes without service interruption.[2][19] The control plane resources are not billed in Vespa Cloud and are managed automatically by the platform.[16]
The unit of deployment in Vespa is the application package: a directory tree that includes a services.xml file declaring container and content clusters, one or more .sd schema definition files per document type, optional Java components, ranking expressions, ONNX model files, and other configuration.[19][23] Deploying the same application package against a single-node test cluster or a hundred-node production cluster uses the same workflow; the configuration servers translate the package into the per-node configuration each process needs.[19][23]
Vespa exposes two main HTTP APIs.[2][19] The document API supports put, update, and remove operations on a per-document basis; writes are durable and become visible to queries within milliseconds.[2][19] The query API accepts requests in YQL (the Vespa Query Language) or in a more compact JSON form, supports complex Boolean and ranking compositions of operators, and returns scored hits assembled from all content clusters.[2][19]
Vespa supports full-CRUD indexing: documents written via the document API are searchable within milliseconds, and partial updates to attribute fields are applied in place without a re-index.[2][22] The HNSW graphs used for vector search are constructed online as documents are fed rather than pre-built offline, allowing the index to stay current while accepting reads.[22] Multithreaded distance calculations during graph updates reduce indexing latency for high-write workloads.[22]
Vespa implements a modified variant of the Hierarchical Navigable Small World algorithm of Malkov and Yashunin (2016) for approximate nearest-neighbor search over tensor fields.[22] The implementation maintains a single graph per tensor field per content node, supports multiple indexed tensor fields concurrently, and supports the hnsw index over multi-vector documents.[22] Approximate nearest-neighbor queries can be combined with arbitrary filters using either pre-filtering (apply filters before traversing the graph) or post-filtering (apply filters after retrieval), with the strategy configurable per query.[22] The greedy graph traversal is sublinear in document count, doubling per-node corpus size adds roughly fifty percent latency in practice.[22] Memory footprint is tunable via cell type: switching from float to bfloat16 approximately halves memory consumption with minimal accuracy loss.[22]
Vespa supports full-text search via inverted indexes with the bm25 scoring function as a built-in rank feature, plus the weakAnd operator for accelerated linguistic-text queries and the wand operator for learned sparse representations such as splade.[21][24] BM25 features integrate into the same rank profile as vector and tensor features, allowing developers to compose hybrid scoring expressions directly inside the engine.[21][24]
Vespa's signature capability is hybrid retrieval inside a single query and a single system rather than across federated services.[21][24] Common patterns include disjunctive retrieval (issue a nearestNeighbor operator and a userQuery operator in OR and expose all candidates to ranking), and the RANK operator (retrieve by one method while computing additional features from another for use only at ranking time).[24] The official "Redefining Hybrid Search" engineering blog notes that adding HNSW indexes to dense vector fields that are only used in ranking phases is wasteful, and recommends selective indexing strategies where retrieval uses sparse signals while denser embeddings contribute only to scoring.[24]
Vespa stores and computes on tensors of arbitrary order, not just one-dimensional vectors, with mixed mapped and indexed dimensions.[25][26] Tensor operations including reduce, join, map, and dot product can be composed in ranking expressions, and the same tensor algebra is used for both vector similarity and richer multi-dimensional scoring.[25][26] This is the substrate for advanced retrieval paradigms such as ColPali, where each document is represented as a tensor of patch embeddings.[27]
Vespa ranking is organized as a pipeline of phases declared in a rank profile.[28][29] The first-phase function runs on every matching document on the content node and selects per-node top candidates; the second-phase function runs locally on the content node on the smaller set returned by the first phase; and the global-phase function runs on the container after the per-node candidates are merged, enabling expensive cross-encoder models to run on the final shortlist.[28][29] Each phase can call out to ONNX-format neural network inferences as native rank features.[28][29]
Vespa supports ranking models in onnx, xgboost, and LightGBM formats embedded directly in the application package and evaluated inside the serving processes on the content or container nodes.[28][29] Because ONNX export is supported by tensorflow, PyTorch, and scikit-learn, a wide range of trained models can be deployed as Vespa rank features without an external inference service.[28][29]
Vespa supports storing multiple embeddings per document and indexing them in a single HNSW graph per field, which is the structure required by late-interaction models such as colbert and the vision-language ColPali model.[22][27] The MaxSim operator native to Vespa's tensor framework computes the late-interaction score by taking the dot product between each query token embedding and each document patch embedding, then reducing across patches with a max operation and summing across query tokens.[27] A Vespa engineering blog post describes scaling ColPali-style retrieval to billions of documents using a two-stage pipeline (HNSW retrieval on a single representative vector per document followed by MaxSim ranking on the multi-vector tensor), with binary quantization shrinking 128-dimensional float patches to 128-bit codes for roughly thirty-two-times storage reduction at about ninety-nine percent of the float-precision nDCG@5.[27]
Vespa predates the modern "vector database" category by roughly two decades; the engineering core matured inside Yahoo from 2003 onward, well before the wave of dedicated vector-database startups in 2017 to 2023.[3][4][30] The closest engineering peer is the faiss library combined with a serving layer, but Faiss is a search algorithm, not a serving system. Among full-stack serving products the most-cited alternatives are listed below.
| System | Origin | License | Vector | Lexical | Hybrid | Real-time CRUD | Tensors |
|---|---|---|---|---|---|---|---|
| Vespa | Yahoo / Vespa.ai, 2003 / 2017 OSS | Apache 2.0 | Yes (HNSW) | Yes (BM25) | Yes (single query) | Yes | Yes |
| pinecone | Pinecone Systems, 2019 | Proprietary SaaS | Yes | Limited | Limited | Yes | No |
| weaviate | SeMI / Weaviate B.V., 2019 | BSD-3 | Yes (HNSW) | Yes (BM25) | Yes | Yes | No |
| qdrant | Qdrant, 2021 | Apache 2.0 | Yes (HNSW) | Yes (BM25-like) | Yes | Yes | No |
| milvus | Zilliz, 2019 | Apache 2.0 | Yes (multiple) | Limited | Limited | Yes | No |
| chroma | Chroma, 2022 | Apache 2.0 | Yes | Limited | Limited | Yes | No |
| lancedb | LanceDB, 2022 | Apache 2.0 | Yes (IVF-PQ) | Limited | Partial | Yes | No |
| pgvector | Crunchy Data / Andrew Kane, 2021 | PostgreSQL | Yes (HNSW, IVFFlat) | Via PostgreSQL FTS | Partial | Yes | No |
Public benchmarks published by Vespa.ai in January 2025 compared Vespa 8.427.7 to Elasticsearch 8.15.2 on a one-million-product e-commerce dataset and reported per-CPU-core throughput advantages of up to 12.9 times for vector queries, 8.5 times for hybrid queries, and 6.5 times for lexical queries, plus approximately four times higher efficiency for in-place updates.[30] Vespa.ai stated that the throughput differences translated into infrastructure-cost reductions of up to five times in equivalent workloads, although such vendor benchmarks should be read alongside independent replications.[30] The fashion-resale marketplace Vinted reported halving its server count, improving search latency by 2.5 times, and reducing indexing latency by three times after migrating from a prior search stack to Vespa.[30]
Architecturally, Vespa differs from Elasticsearch and hnsw-only services in that retrieval and ranking are unified in one process: ranking models and inverted-index lookups run together on the content node, so an entire multi-phase scoring pipeline can complete without leaving the cluster.[21][28]
Yahoo operates roughly 150 internal Vespa applications collectively serving close to a billion users at around 800,000 queries per second across properties including Yahoo.com, Yahoo News, Yahoo Sports, Yahoo Finance, and the Yahoo (formerly Gemini) advertising network.[4][6][17] Yahoo Gemini alone was serving more than three billion native ad requests per day as of the 2017 open-source announcement.[4]
Perplexity Inc. (the company behind the perplexity answer engine) uses Vespa as its retrieval backbone, blending hybrid sparse-and-dense retrieval, chunk-level passage selection, and integrated machine-learning inference inside the serving layer.[10] Vespa.ai's customer page reports that by May 2025 Perplexity served 22 million active users and 780 million monthly queries on Vespa-based infrastructure, with sub-second latency at thousands of concurrent queries per second.[10]
Spotify uses Vespa across several product surfaces, including semantic search over podcasts using dense embeddings alongside lexical retrieval, as described on Vespa.ai's case-studies page.[11] Spotify appears in the official Yahoo press release as an existing external customer at the time of the October 2023 spin-out.[6]
The website-builder company Wix uses Vespa for search across its hosted-sites and apps product surface, and was listed in both the spin-out press release and the Series A coverage as an external customer.[6][9]
The European secondhand-clothing marketplace Vinted migrated to Vespa for product search and personalization. Vespa.ai's case study reports that Vinted's migration delivered a 1.1 percent uplift in transactions and more than 3.5 million euros in additional gross merchandise value, alongside operational gains including a halving of server count.[11][30]
Vespa.ai also lists Elicit, Onyx, RavenPack, Qwant, Metal AI, Clarm, and Mimeta-Civsy as production users across research search, enterprise search, financial research, privacy-focused web search, and specialized analytical retrieval.[11]
Vespa Cloud is the managed service operated by Vespa.ai, launched in 2021 and made widely available to external customers at the same time.[7][16] It runs Vespa application packages on AWS and Google Cloud across multiple regions and provides automatic data-plane upgrades, autoscaling of container and content clusters, and an enclave deployment option in which the customer's own AWS or GCP account hosts the data while Vespa.ai operates the software.[16]
Pricing is consumption based: customers pay for the container and content clusters they provision, with the control-plane services included at no extra charge.[16] Published prices start at approximately ten cents per gigabyte per month for storage and compute, and unit prices decrease linearly with the total resources allocated to an application up to a fifty percent discount (or up to eighty-three percent in the enclave configuration).[16] Autoscaling is applied to all clusters and Vespa.ai documents typical cost reductions of around fifty percent on bursty workloads from running scaled-to-need rather than provisioned to peak.[16] Quota is expressed in dollars per hour and is computed against the maximum possible cost of a configured application.[16]
Vespa.ai's $31 million Series A round closed on November 1, 2023, led by Blossom Capital, the London-based Series A specialist.[8][9] The round was the first significant external capitalization of Vespa.ai after the Yahoo spin-out, which had occurred about a month earlier.[7][8] Yahoo retained an equity stake and a board seat from the spin-out and was not listed as a Series A investor by Vespa.ai's announcement, but remained the largest customer of the platform.[6][9] DLA Piper's Norwegian office advised Blossom on the transaction and characterized it as a Series A investment into the Norwegian company Vespa.ai AS.[18]
The company stated that the proceeds would be used to grow the standalone business, expand the engineering function, accelerate feature delivery, and develop the Vespa Cloud commercial offering.[8][9] As of the TechCrunch coverage in November 2023 the company employed 29 people.[9]
The founders listed on the Vespa.ai company page are:[31]
Vespa.ai's headquarters and engineering hub is in Trondheim, Norway, with employees distributed globally.[31]