Weaviate
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v6 ยท 7,244 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 9, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v6 ยท 7,244 words
Add missing citations, update stale details, or suggest a clearer explanation.
Weaviate is an open-source vector database that stores both data objects and their vector embeddings, enabling a combination of vector similarity search with structured filtering, keyword retrieval, and integrated generative search. Written in Go and licensed under BSD-3-Clause, the system is designed for speed and reliability in artificial intelligence applications. The project began as an open-source initiative by Bob van Luijt in March 2016 and was later commercialized by SeMI Technologies, the Amsterdam-based company that he co-founded with Etienne Dilocker at the end of 2018. SeMI Technologies rebranded to Weaviate B.V. in January 2023 [1][2][3]. By early 2026, the open-source project had crossed 16,000 GitHub stars, and the company reported more than 5 million container downloads, an active community of over 50,000 builders, and adoption by more than 2,000 production users [4][5].
The Weaviate project began in March 2016, when Dutch technology entrepreneur Bob van Luijt (born 15 November 1985 in Bergen op Zoom) started experimenting with the idea of a database where semantic relationships were a first-class citizen. He was working through a strategic design consultancy called Kubrickology, and his early thinking was shaped by two influences: the GloVe paper on word embeddings, which he encountered in 2015, and Google's Weave and Brillo "things" framework, which he first saw at the Ubiquity Conference in San Francisco in January 2016 [3][6].
In 2017, van Luijt published a blog post connecting Internet of Things concepts with the semantic web, sketching the framework that would later become Weaviate. By the end of 2018, he had entered a startup accelerator in the Netherlands and assembled the founding team that would become SeMI Technologies. The company name stood for Semantic Machine Insights. The original Weaviate codebase had been more of a traditional knowledge graph, but during the accelerator phase the team pivoted toward semantic search powered by vector embeddings, what they called the "Weaviate Search Graph" [3][6].
Etienne Dilocker joined as co-founder and Chief Technology Officer. Dilocker is a German engineer based in Mannheim with more than 15 years of experience in cloud-native systems, Go, databases, and site reliability. He had previously contributed to ANN-Benchmarks and ran his own consultancy, Dilocker Software Engineering. His research interests include distributed systems, auto-scaling databases, vector index design, and high-cardinality metadata filtering, the last of which he has explored in collaboration with the University of Pisa [7][8].
A third name often listed alongside the founders is Micha Verhagen, who joined the early team. SeMI Technologies originally spun out of work at ING Labs, the innovation arm of Dutch bank ING Group, which became one of the early investors in the project [1].
Version 1.0.0 of Weaviate was released on 14 January 2021. This release introduced the modular API that allows the database to plug into different embedding providers and other extensions. By that point the project had already moved away from being a niche knowledge graph and was positioning itself as a dedicated vector database for the emerging generative-AI era [9].
In January 2023, the company changed its legal name from SeMI Technologies to Weaviate B.V. According to a press release at the time, the open-source product had become significantly better known than the corporate entity, so the team decided to consolidate under the Weaviate brand [2].
Weaviate has raised approximately $67.7 million across three publicly disclosed funding rounds. The company has remained at the Series B stage as of early 2026 and has not announced any subsequent priced rounds [10][11].
| Round | Date | Amount | Lead investor(s) | Other participants |
|---|---|---|---|---|
| Seed | August 2020 | $1.2M | Zetta Venture Partners | ING Ventures |
| Series A | February 2022 | $16M | Cortical Ventures, NEA | Zetta Venture Partners, ING Ventures |
| Series B | April 2023 | $50M | Index Ventures | Battery Ventures, NEA, Cortical Ventures, Zetta Venture Partners, ING Ventures |
The seed round in August 2020 came from Zetta Venture Partners and ING Ventures and provided the runway to take Weaviate from prototype to production [10]. The $16 million Series A in February 2022 was co-led by Cortical Ventures and New Enterprise Associates (NEA), with Zetta and ING Ventures continuing as participants. The Series A press release described Weaviate as "a new wave of AI-first database tech" [12][13].
The Series B closed on 21 April 2023 at $50 million, led by Index Ventures and joined by Battery Ventures alongside the prior investors. The round was driven by demand from teams building retrieval-augmented generation systems on top of large language models. Capital from the Series B has been used to expand the engineering team, build out Weaviate Cloud, and grow commercial operations across North America and Europe [11][14].
Reporting in late 2024 indicated that Weaviate had reached around $12.3 million in annual revenue with a team of approximately 104 employees, suggesting strong commercial traction without an immediate need for additional priced rounds [15].
Weaviate is headquartered at Prinsengracht 769A in Amsterdam, Netherlands, the same canal-side district where the founding team began. The company is structured as remote-first, with engineers, researchers, and commercial staff distributed across Europe, North America, and other regions. Public estimates put headcount at roughly 90 to 105 people through 2024 and 2025 [15][16].
Key leadership disclosed publicly includes:
| Role | Person |
|---|---|
| Chief Executive Officer | Bob van Luijt (co-founder) |
| Chief Technology Officer | Etienne Dilocker (co-founder) |
| Vice President of Engineering | Paul de Grijp |
| Director of Applied Research | John Trengrove |
| Head of People and Culture | Jessie de Groot |
The company describes its values as Be Kind, Work Together as One, Strive for Excellence, Encourage Transparency, and Inspire Trust [16].
Weaviate's architecture is designed to handle multiple search paradigms (vector, keyword, and hybrid) within a single system. The database is written in Go, which the team chose for its concurrency model, low memory overhead, and strong tooling around binary builds. Roughly 97 percent of the codebase is Go, with a small amount of Python tooling for tests and developer scripts [4].
A Weaviate node combines several subsystems: a storage engine based on a custom log-structured merge-tree (LSM-tree), an inverted index for keyword search, one or more vector indexes per collection, a schema and metadata service, and a multi-protocol API surface (REST, gRPC, and GraphQL). Cluster coordination uses the Raft consensus algorithm, while object data is replicated using a leaderless eventually consistent model [17][18].
Weaviate uses HNSW (Hierarchical Navigable Small World) as its primary vector index algorithm. HNSW is a graph-based approach that searches vectors by navigating through multiple layers, moving from coarse approximations to fine ones. Its complexity grows logarithmically rather than linearly with the dataset size, which makes it effective even at billions of vectors [19].
Each named vector in Weaviate maintains its own vector index, which can be either an HNSW graph or a flat index. The HNSW graph is held in memory for fast access, with commit logs and snapshots used to restore the structure after a restart. From version 1.31, Weaviate added HNSW snapshotting, which produces periodic on-disk snapshots that can be loaded directly at startup. According to the release notes, this can reduce startup time for large indexes by approximately 10 to 15 times compared with replaying the write-ahead log [9][20].
In single-instance mode, Weaviate's in-memory HNSW index handles millions of vectors with sub-second response times. For larger datasets, cluster mode distributes data across nodes with sharding and replication, enabling horizontal scaling to billions of data points [19].
Weaviate exposes several HNSW tuning parameters that allow developers to optimize the tradeoff between recall, speed, and memory:
| Parameter | Default | Description |
|---|---|---|
| efConstruction | 128 | Size of dynamic candidate list during index construction; higher values improve graph quality at the cost of slower builds |
| maxConnections | 64 | Maximum number of connections per node in the HNSW graph; higher values improve recall but increase memory usage |
| ef | -1 (dynamic) | Size of the candidate list during search; controls the recall-latency tradeoff at query time |
| vectorCacheMaxObjects | 1 trillion | Maximum number of vectors to cache in memory; reducing this value helps when the dataset exceeds available RAM |
| flatSearchCutoff | 40,000 | Collections smaller than this threshold use brute-force (flat) search instead of HNSW |
The flat index option is useful for small collections or multi-tenant environments where each tenant has a relatively small dataset. Instead of building and maintaining an HNSW graph, flat indexing stores vectors in a simple array and performs brute-force search, which is faster for small collections due to lower overhead [19].
Weaviate supports several quantization techniques for compressing vectors and reducing memory consumption. Quantization is configured per HNSW (or flat) index and can be combined with rescoring, where the database re-ranks the top candidates using their full-precision vectors to recover most of the lost recall [21].
| Technique | Bit-width | Memory saving | Notes |
|---|---|---|---|
| Product Quantization (PQ) | Variable | About 85% | Splits vectors into segments and quantizes each segment using k-means codebooks |
| Scalar Quantization (SQ) | 8 bits per dimension | About 75% | Maps each 32-bit float dimension into one of 256 buckets, trained on the data distribution |
| Binary Quantization (BQ) | 1 bit per dimension | About 97% | Stores each dimension as a single bit; fastest but least precise |
| Rotational Quantization (RQ) | 8 or 1 bits | 75% to ~94% | Applies a rotation to decorrelate dimensions before quantization, improving recall over SQ and BQ |
Rotational quantization (RQ) was introduced in version 1.32 (July 2025) as the recommended default for new collections. A 1-bit RQ variant, providing extreme compression rates while preserving most of the search quality, became a preview feature in version 1.33 (October 2025). Each quantization mode preserves a copy of the uncompressed vectors so that rescoring can refine the final top-k results [22][23].
Through a partnership with NVIDIA, Weaviate supports GPU-accelerated index building using the cuVS library. The CAGRA algorithm builds indexes on the GPU, then converts them to HNSW format for cost-effective CPU-based query serving. This approach speeds up index construction while keeping query costs low [24].
In addition to the HNSW vector index, Weaviate maintains an inverted index based on BM25F scoring for keyword search. The BM25F algorithm extends the standard BM25 scoring function with field-level weighting, so different properties of an object (such as title versus body text) can contribute differently to keyword relevance scores. This inverted index powers Weaviate's keyword search and is also used as one component of hybrid search [25][26].
From version 1.30, Weaviate replaced the previous BM25 implementation with BlockMax WAND, an algorithm that allows the database to skip large blocks of postings during scoring. In Weaviate's internal benchmarks, BlockMax WAND reduced BM25 query latency by an order of magnitude on large corpora compared with the previous traversal strategy [27].
Weaviate's hybrid search combines dense vector similarity with BM25 keyword matching in a single native API call. According to Weaviate's 2025 benchmarks, hybrid search improved NDCG@10 by 42 percent over pure vector search across a panel of information-retrieval datasets, which is particularly relevant for retrieval-augmented generation workloads where recall directly affects the quality of generated answers [25].
The system fuses dense vector embeddings with sparse BM25 lexical matching, and developers can control the weighting between the two methods. Independent evaluations report 35 to 50 percent relevance improvements from hybrid search compared to either method alone [28].
Weaviate offers two fusion algorithms for combining vector and keyword results:
| Fusion algorithm | Description | Best for |
|---|---|---|
| Ranked Fusion | Combines results by summing the inverse of each result's rank in the individual search lists | General-purpose hybrid queries |
| Relative Score Fusion | Normalizes scores from each search method to [0, 1] range and combines them | When the raw score distributions of the two methods differ significantly |
The alpha parameter controls the balance between vector search (alpha = 1.0) and keyword search (alpha = 0.0). Setting alpha to 0.5 weights both methods equally. In practice, values between 0.7 and 0.8 (favoring vector search slightly) tend to perform well for RAG workloads [25][26].
From version 1.31 (June 2025), Weaviate added support for multi-vector embeddings using MUVERA encoding. Multi-vector models such as ColBERT represent each document as a set of token-level vectors rather than one pooled vector, which can improve retrieval accuracy at the cost of storage. MUVERA provides a fixed-dimension single-vector representation that approximates the late-interaction scoring used by ColBERT, allowing Weaviate to plug multi-vector models into its standard HNSW pipeline without exotic data structures [20].
From version 1.25 onward, Weaviate uses the Raft consensus algorithm for cluster metadata, including schema operations, role-based access control changes, and tenant lifecycle events. Raft is implemented using the HashiCorp Raft library; one node is elected leader, and metadata changes are committed once a quorum of nodes acknowledges them. This was a significant change from earlier versions, which had used a simpler gossip-style protocol that could not safely support concurrent schema operations [17].
Object data replication uses a leaderless, eventually consistent model with tunable consistency levels (ONE, QUORUM, ALL) at read and write time. Async replication, asynchronous shard replica movement between nodes, and rebalancing capabilities have all been added across the 1.25 through 1.32 release series, gradually maturing Weaviate into a distributed system suitable for production workloads at scale [17][22].
Weaviate's module system is one of its defining architectural features. Modules extend the database with additional capabilities at various stages of the data pipeline: vectorization (generating embeddings from raw data), generative AI (producing text from search results), reranking (refining result order), and reference resolution (loading data from external sources).
Modules are loaded at server startup via configuration. In Docker deployments, modules are specified as environment variables. In Kubernetes, they are configured in the Helm chart values. Weaviate Cloud instances come with a default set of modules pre-configured [29].
Vectorizer modules automatically generate embeddings when objects are inserted into Weaviate. Instead of computing embeddings externally and uploading them, users configure a vectorizer module on a collection and Weaviate handles embedding generation transparently.
| Module | Provider | Model type | Key notes |
|---|---|---|---|
| text2vec-openai | OpenAI | Text | Uses OpenAI's embedding API (text-embedding-ada-002, text-embedding-3-small, etc.) |
| text2vec-cohere | Cohere | Text | Uses Cohere's multilingual embedding models |
| text2vec-huggingface | Hugging Face | Text | Uses Hugging Face Inference API for hosted models |
| text2vec-transformers | Self-hosted | Text | Runs transformer models locally in a sidecar container |
| text2vec-ollama | Ollama | Text | Runs local embedding models through an Ollama instance |
| text2vec-google | Text | Uses Google's embedding models (Vertex AI, PaLM) | |
| text2vec-aws | AWS | Text | Uses Amazon Bedrock embedding models |
| text2vec-jinaai | Jina AI | Text | Uses Jina AI's embedding models |
| text2vec-voyageai | Voyage AI | Text | Uses Voyage AI's domain-specific embeddings |
| text2vec-snowflake | Snowflake Arctic | Text | Uses Snowflake's open-source Arctic embedding family |
| multi2vec-clip | OpenAI CLIP | Multi-modal | Embeds both text and images into a shared vector space |
| multi2vec-bind | ImageBind | Multi-modal | Meta's ImageBind for text, image, audio, and video embeddings |
| multi2vec-cohere | Cohere | Multi-modal | Cohere Embed v4 multimodal text and image embeddings |
Users can also import pre-computed vector embeddings directly, bypassing the module system entirely. This is the recommended path for users who already maintain a centralized embedding pipeline outside the database [29].
Generative modules connect Weaviate to large language models for retrieval-augmented generation (RAG) directly within the database.
| Module | Provider | Description |
|---|---|---|
| generative-openai | OpenAI | Uses GPT-4, GPT-3.5-turbo, and newer OpenAI chat models |
| generative-anthropic | Anthropic | Uses Claude family models |
| generative-cohere | Cohere | Uses Cohere's Command models |
| generative-aws | AWS | Uses Amazon Bedrock models (Claude, Titan, Llama) |
| generative-google | Uses Google's Gemini and PaLM models | |
| generative-anyscale | Anyscale | Uses open-source models hosted on Anyscale |
| generative-mistral | Mistral | Uses Mistral chat and instruct models |
| generative-friendliai | FriendliAI | Hosted open-source LLM endpoints |
| generative-contextual | Contextual AI | Generation tuned for grounded enterprise RAG |
A generative search query in Weaviate consists of two parts: a search query (vector, keyword, or hybrid) and a prompt for the language model. Weaviate first retrieves relevant objects, then passes both the search results and the prompt to the configured generative model, returning the generated response alongside the original search results [30].
Reranker modules apply a second-pass ranking model after the initial retrieval step. Cross-encoder reranking models score each retrieved result against the original query, which typically produces more accurate relevance judgments than the initial bi-encoder similarity search. Available reranker modules include reranker-cohere, reranker-voyageai, reranker-jinaai, reranker-contextual, and reranker-transformers [29].
Weaviate organizes data into collections (previously called "classes"). Each collection defines a schema that specifies the properties of its objects, the vectorizer to use, the generative module (if any), and index configuration.
A collection definition includes:
Weaviate also supports auto-schema, which is enabled by default. When auto-schema is active, Weaviate infers collection definitions from the data being inserted, automatically detecting property names and types. This is convenient for prototyping but should be disabled in production for predictable behavior [31].
Cross-references allow linking objects across collections. For example, an Article collection might reference objects in an Author collection, enabling graph-like traversals in queries.
From version 1.32, Weaviate added collection aliases, which let operators point an alias name at different underlying collections. This makes zero-downtime migrations practical: a new collection is built behind a fresh name, the alias is swapped to it once the data is ready, and the old collection can be dropped without changing client code [22].
Weaviate implements native multi-tenancy with one shard per tenant, dynamic resource management, and true data isolation. This design means each tenant's data is physically separated, not just logically filtered. Recent updates introduced tenant offloading to S3 cloud storage and renamed the tenant activity statuses from HOT/COLD to ACTIVE/INACTIVE for clarity [32].
Multi-tenancy is useful for SaaS applications where each customer needs their own isolated dataset but the application operator does not want to manage separate database instances for each customer.
Weaviate's multi-tenancy implementation supports three tenant states:
| State | Data location | Cost | Access speed |
|---|---|---|---|
| ACTIVE (formerly HOT) | In memory and on local disk | Highest | Fastest |
| INACTIVE (formerly COLD) | On local disk only | Medium | Moderate (requires activation) |
| OFFLOADED | In S3 cloud storage | Lowest | Slowest (requires reactivation to local disk) |
This tiered approach allows operators to manage costs by offloading inactive tenants to cheaper storage while keeping frequently accessed tenants in memory for low-latency queries. Reference customers such as Instabase operate clusters with more than 50,000 tenants, illustrating that the design holds up at significant scale [32][33].
Weaviate exposes three APIs: REST, GraphQL, and gRPC. The GraphQL API is particularly distinctive among vector databases, since most competitors offer only REST or gRPC interfaces. GraphQL allows clients to specify exactly which fields they need, reducing over-fetching and making it easier to build flexible query interfaces [34].
A typical Weaviate GraphQL query can combine vector search, keyword filters, and property selection in a single request:
{
Get {
Article(
hybrid: {
query: "machine learning applications"
alpha: 0.75
}
limit: 5
) {
title
content
_additional {
score
distance
}
}
}
}
The alpha parameter controls the balance between vector search (1.0) and keyword search (0.0).
The gRPC interface, introduced incrementally between version 1.19 and version 1.23.7, is the preferred protocol for high-throughput workloads. Internal benchmarks comparing the gRPC and REST/GraphQL paths on identical hardware showed query latency reductions of 40 to 70 percent and import-time reductions of close to 50 percent on the DBPedia benchmark dataset [35]. The gRPC API uses HTTP/2 and Protocol Buffers, which produce smaller, faster-to-parse payloads than JSON over HTTP/1.1.
A short example using the v4 Python client illustrates a hybrid query:
import weaviate
from weaviate.classes.query import HybridFusion
client = weaviate.connect_to_local()
articles = client.collections.get("Article")
results = articles.query.hybrid(
query="vector databases for RAG",
alpha=0.75,
fusion_type=HybridFusion.RELATIVE_SCORE,
limit=5,
)
for obj in results.objects:
print(obj.properties["title"], obj.metadata.score)
client.close()
Weaviate includes built-in generative search capabilities, allowing users to perform retrieval-augmented generation directly within the database. After retrieving relevant objects through vector or hybrid search, Weaviate can pass those results to a configured language model (such as OpenAI's GPT-4, Anthropic's Claude, or Cohere's models) to generate answers, summaries, or transformations. This eliminates the need for a separate orchestration layer for basic RAG use cases [30].
Generative search supports two modes:
Integrated reranking modules let users apply a second-pass ranking model after the initial retrieval step. Reranking models score each retrieved result against the original query using a cross-encoder, which typically produces more accurate relevance judgments than the initial bi-encoder similarity search.
Weaviate supports multiple named vectors per object, allowing a single object to have separate embeddings for different properties or different embedding models. For example, a product listing might have one vector for its title (embedded with a lightweight model) and another for its description (embedded with a more powerful model). Queries can target specific named vectors, enabling flexible multi-modal and multi-representation search strategies.
Weaviate supports API-key, OIDC, and anonymous authentication modes, configured via environment variables on each node. From the 1.29/1.30 release series, the database includes a built-in role-based access control (RBAC) system, with predefined roles for root and viewer plus user-defined roles that pin down permissions at the level of individual collections, tenants, and operations [36].
From version 1.34, observability capabilities expanded with more than 30 new monitoring metrics, covering LSM bucket reads and writes, write-ahead log recovery, memtable flushing, and asynchronous replication. These metrics are exposed in Prometheus format and feed into Weaviate Cloud's standard monitoring dashboards [37].
Weaviate offers several ways to run the database, from fully self-managed to fully hosted. The core engine is identical across deployment modes [38][39].
| Deployment | Operator | Notes |
|---|---|---|
| Self-hosted (Docker) | User | Single-node or small clusters; ideal for development |
| Self-hosted (Kubernetes) | User | Helm chart; the recommended path for production self-hosting |
| Embedded Weaviate | User | Library mode that starts a local Weaviate process from inside Python or Node.js scripts |
| Weaviate Cloud (Serverless) | Weaviate | Multi-tenant shared cloud, billed by usage |
| Weaviate Cloud (Enterprise) | Weaviate | Dedicated single-tenant clusters with SLA, HIPAA on AWS |
| BYOC (Bring Your Own Cloud) | Joint | Database runs in customer's VPC; Weaviate manages the control plane |
| AWS Marketplace | User or Joint | Container deployment inside customer AWS account |
Weaviate restructured its pricing in October 2025, replacing earlier per-dimension billing with three usage-based dimensions (data, performance, and AI features) plus a $45/month minimum to cover baseline cluster costs. The Cloud product is offered in tiered plans called Flex, Plus, and Premium [40].
| Plan | Starting price | Description |
|---|---|---|
| Sandbox (Free) | $0 | 14-day trial clusters for experimentation |
| Flex | $45/month minimum | Shared infrastructure, billed by usage |
| Plus | $280/month | Higher resource ceiling on shared infrastructure |
| Premium | $400+/month | Dedicated resources and priority support |
| Enterprise Cloud | Custom | Single-tenant clusters with SLA, HIPAA, BYOC options |
| BYOC | From ~$1,390/month | Database runs in customer's cloud, Weaviate operates the control plane |
The BYOC offering is targeted at enterprises with data-residency, regulatory, or networking constraints. The cluster runs inside the customer's AWS, GCP, or Azure VPC on managed Kubernetes, while Weaviate handles application-level security, configuration, upgrades, patches, and 24/7 monitoring. This design separates the control plane (Weaviate's responsibility) from the data plane (customer's environment) [38].
Weaviate Cloud also runs HIPAA-eligible Enterprise Cloud workloads on AWS, with parallel work in progress for Azure and GCP. The same engine powers integrations with the AWS Marketplace, where users can deploy Weaviate as a containerized cluster inside their AWS tenant.
Weaviate Embeddings, launched in December 2024, is a managed embedding service that runs inside Weaviate Cloud. The service offers GPU-backed inference for both open-source and proprietary models, removing the need for users to call external APIs or operate their own embedding infrastructure [41][42].
The service launched with two Snowflake Arctic models, snowflake-arctic-embed-m-v1.5 (English-only, 512 tokens) and snowflake-arctic-embed-l-v2.0 (multilingual, 8192 tokens). Pricing is consumption-based, and the service has no hard cap on embeddings per second, making it suitable for production-scale ingestion. Additional models and modalities have been added on a rolling basis since 2025 [41][42].
In March 2025, Weaviate introduced a new product line called Weaviate Agents: pre-built agentic services that ship with Weaviate Cloud. Three agents have been released [43][44].
| Agent | First preview | Status | Purpose |
|---|---|---|---|
| Query Agent | March 2025 | GA September 2025 | Natural-language access to Weaviate collections, including multi-stage queries and aggregations |
| Transformation Agent | March 2025 | Preview | Uses an LLM to rewrite, enrich, or augment objects already in a collection based on natural-language instructions |
| Personalization Agent | April 2025 | Preview | Re-ranks results for a user based on a stored persona and prior interactions |
The Query Agent reads a Weaviate cluster's schema, decides which collections to query, generates the appropriate vector or aggregation query, and returns a natural-language answer with citations to the underlying objects. It became generally available in September 2025 for Serverless customers [43][45].
In February 2026, Weaviate also announced Agent Skills, a developer toolkit that helps coding agents (such as those built on the Model Context Protocol) understand Weaviate APIs and write correct queries against them [46].
Verba is Weaviate's open-source RAG reference application. The repository, hosted on GitHub at weaviate/Verba, provides a modular chatbot interface that ingests documents from sources such as PDFs (via UnstructuredIO), GitHub repositories, and Markdown files, then answers questions using Weaviate as the retrieval back-end and a configurable LLM as the generation step. As of early 2026, the project had accumulated over 7,000 stars and was widely used as a starting point for self-hosted RAG demos and internal tools [47][48].
Verba supports four deployment modes: a fully embedded mode using Weaviate Embedded, a Docker mode, a connection to an existing Weaviate Cloud cluster, and a custom URL mode pointing at any reachable Weaviate instance. The default model stack is intentionally swappable: SentenceTransformers for embeddings, OpenAI or Ollama for generation, and either Weaviate Cloud or local Weaviate for storage.
Weaviate ships approximately one minor release every one to two months. The table below highlights selected releases that introduced major capabilities. Weaviate's stated support policy is to maintain bug fixes and security patches for the latest three minor versions [9][20][22][23][37].
| Version | Release date | Highlights |
|---|---|---|
| 1.0.0 | January 14, 2021 | Initial GA, modular API, support for importing pre-computed vectors |
| 1.2.0 | 2021 | text2vec-transformers module |
| 1.5.0 | 2021 | LSM-tree storage, auto-schema |
| 1.8.0 | 2021 | Horizontal scalability, sharding, pagination |
| 1.18.0 | 2023 | Hybrid search GA, Multi-tenancy preview |
| 1.19.0 | 2023 | gRPC interface (initial), generative-cohere |
| 1.22.0 | 2023 | Multi-tenancy GA, named vectors preview |
| 1.23.0 | 2024 | gRPC GA in v1.23.7, dynamic ef tuning |
| 1.24.0 | 2024 | Named vectors GA, partial flat-index support |
| 1.25.0 | 2024 | Raft consensus for cluster metadata |
| 1.26.0 | 2024 | Async replication, RBAC preview |
| 1.27.0 | 2024 | Multi-vector preview, ColBERT support |
| 1.28.0 | 2024 | Snowflake Arctic embeddings, query speed-ups |
| 1.29.0 | February 2025 | RBAC GA, Weaviate Embeddings service integration |
| 1.30.0 | April 2025 | Runtime configuration, dynamic user management, BlockMax WAND BM25 |
| 1.31.0 | June 2025 | MUVERA multi-vector encoding, HNSW snapshots |
| 1.32.0 | July 2025 | Collection aliases, Rotational Quantization (RQ), shard replica movement GA |
| 1.33.0 | October 2025 | 1-bit RQ preview, ContainsNone and Not filter operators |
| 1.34.0 | November 2025 | Flat-index plus RQ, expanded observability metrics, Contextual AI integration |
| 1.35.0 | December 2025 | Stability and storage improvements |
| 1.36.0 | February 2026 | Latest stable release for new clusters |
| 1.37.0 | April 2026 | Secure MCP server, extensible tokenizers, incremental backups |
Weaviate publishes benchmark results for its search modes. In 2025, the team benchmarked hybrid search against pure vector search and pure keyword search across 12 information retrieval datasets from BEIR, LoTTe, and BRIGHT, as well as WixQA and EnronQA.
Key findings from these benchmarks include:
| Search mode | Relative performance | Notes |
|---|---|---|
| Pure vector search | Baseline | Uses dense embeddings only |
| Pure BM25 keyword search | Varies by dataset | Better for exact keyword matching; weaker for semantic queries |
| Hybrid search (BM25 + vector) | +42% NDCG@10 vs pure vector | Combining both methods yields the strongest results across most datasets |
| Search Mode (auto-optimized) | Best overall | Weaviate's automatic search mode selection further improves results [49] |
In terms of raw throughput, optimized Weaviate deployments have been benchmarked at approximately 10,000 to 15,000 queries per second for pure vector search workloads. Latency numbers depend heavily on dataset size, dimensionality, and hardware configuration [50]. For ingestion, the move from REST to gRPC roughly halved the time to insert the 1 million records of the DBPedia benchmark dataset, from over 42 minutes to about 23 minutes on the same hardware [35].
Weaviate is most often compared with Pinecone, Qdrant, Chroma, and Milvus. The competitive picture is shaped by license model (open source versus proprietary), deployment options, hybrid search capabilities, and the level of integration with embedding and generative providers [50][51].
| Feature | Weaviate | Pinecone |
|---|---|---|
| Source model | Open-source (BSD-3-Clause) | Closed-source, managed SaaS |
| Self-hosting | Yes (Docker, Kubernetes, BYOC) | No (cloud-only, or BYOC on Dedicated plan) |
| Hybrid search | Native BM25 + vector in one API call | Requires separate sparse vector index |
| API style | REST, gRPC, GraphQL | REST, gRPC |
| Auto-vectorization | Yes, via modules and Weaviate Embeddings | Yes, via Pinecone Inference |
| Multi-tenancy | Native, one shard per tenant | Via namespaces |
| Generative search | Built-in RAG modules | Via Pinecone Assistant |
| Primary language | Go | Not disclosed (managed service) |
Pinecone tends to win on setup simplicity and consistent low-latency performance for pure vector search workloads. Weaviate tends to win on flexibility, hybrid search quality, and cost control through self-hosting options. For workloads where hybrid search is central to the application, Weaviate's native BM25 + vector fusion in a single API call is more efficient than Pinecone's approach of maintaining separate sparse and dense indexes [50].
Performance benchmarks from 2025 show Weaviate achieving 10,000 to 15,000 queries per second with optimized configuration, while Pinecone's Dedicated Read Nodes achieved 5,700 QPS at P99 60ms on 1.4 billion vectors. Direct comparisons are difficult since the hardware, dataset sizes, and query patterns differ across benchmarks [50].
| Feature | Weaviate | Qdrant |
|---|---|---|
| Source model | Open-source (BSD-3-Clause) | Open-source (Apache 2.0) |
| Primary language | Go | Rust |
| Hybrid search | Native (BM25 + vector) | Native (sparse + dense) |
| GraphQL | Yes | No |
| Modular embedding/generation | Extensive module catalog | Smaller, focused integration list |
| Sharding model | One shard per tenant or hash-based | Hash-based |
| Filtering optimization | Inverted index plus dynamic filter strategy | Custom filterable HNSW |
Qdrant emphasizes raw performance, especially around payload-heavy filtered search, and is often praised for its memory footprint due to its Rust implementation. Weaviate emphasizes a richer feature surface (modules, agents, Weaviate Embeddings, generative search). Both are viable open-source options at production scale, and many teams pick between them based on whether they prefer Go or Rust on their infrastructure team [50][51].
Milvus, maintained by Zilliz under the Apache 2.0 license, is another open-source vector database focused on raw performance and a broad menu of index types (HNSW, IVF, DiskANN, GPU indexes). Compared with Milvus, Weaviate ships with a thinner index menu (HNSW and flat) but a richer module catalog and integrated generative search. Milvus tends to perform better on extremely large pure-vector workloads with custom index choices, while Weaviate tends to be easier to integrate with embedding and LLM providers out of the box [51].
Chroma is a developer-friendly embedding database aimed mostly at prototyping; it does not target the same scale or feature set as Weaviate. pgvector is a PostgreSQL extension that adds vector indexing to an existing relational database; it is a strong choice when an application already uses PostgreSQL and needs only modest vector-search functionality. Weaviate occupies a middle ground: more capable than Chroma, more specialized than pgvector, and open-source unlike Pinecone [51].
Weaviate publishes case studies on its website covering production deployments. Notable users include [37][33][52][53]:
| Company | Use case | Notes |
|---|---|---|
| Stack Overflow | Hybrid search for OverflowAI features on Stack Overflow's question and answer corpus | Required an open-source vector database that could run on existing Azure infrastructure |
| Instabase | Document intelligence for mortgage, insurance, and other regulated industries | Processes more than 500,000 documents per day across more than 50,000 tenants in a single Weaviate cluster, ingesting more than 450 distinct data types |
| Stack AI | Enterprise no-code AI platform with retrieval-augmented assistants | Highlighted in Weaviate's case study library |
| Morningstar | Financial research and analyst tooling | Cited as a customer in Weaviate's commercial materials |
| Cisco | Internal knowledge and search applications | Cited as a customer |
| Bunq | European mobile-first bank using Weaviate for AI-driven search | Cited as a customer |
| Red Hat | Internal search and document retrieval | Cited as a customer in joint AWS materials |
| NetApp | Data infrastructure for enterprise AI workloads | Cited as a customer in joint AWS materials |
| MetaBuddy | AI coaching platform | Reported a 60 percent reduction in trainer analysis time and triple the user engagement after migrating to Weaviate |
In April 2025, Weaviate was named one of the AWS Partner of the Year recipients in the EMEA region for its work in generative AI infrastructure on AWS Marketplace [54].
The Weaviate open-source project has accumulated over 16,000 GitHub stars and more than 5 million container downloads. The company reports a community of more than 50,000 AI builders and over 2,000 production users [4][5][16].
Weaviate provides official client libraries for Python, JavaScript/TypeScript, Go, Java, and C#/.NET. The Python client is the most popular, with active development tracked in its changelog [55].
| Client library | Package | Status |
|---|---|---|
| Python | weaviate-client (PyPI) | Most popular; supports sync and async operations on top of gRPC |
| JavaScript/TypeScript | weaviate-client (npm) | Used in Node.js and browser applications |
| Go | weaviate-go-client | Native Go client for Go applications |
| Java | io.weaviate:client (Maven) | Java client for JVM-based applications |
| C#/.NET | Weaviate.Client (NuGet) | Official client added in 2024 |
Integrations exist with LangChain, LlamaIndex, Semantic Kernel, Haystack, Dify, Embedchain, and other AI orchestration frameworks. These integrations allow developers to use Weaviate as the vector store component in larger machine learning pipelines.
The Weaviate community includes an active forum, a Slack workspace with thousands of members, and regular community calls. Engagement levels are high, with the company reporting more than 200 messages per week in its open Slack and Discourse forum. The Weaviate Hero program recognizes members who contribute through speaking, blogging, code, and community moderation [56][57].
The company also runs Weaviate Academy, a free educational platform with structured courses on using the database, from zero-to-MVP guides to advanced performance optimization tutorials, and the Weaviate World Tour series of in-person workshops, conferences, and hackathons across Europe and North America [29][57].
By May 2026, Weaviate continues to develop as both an open-source project and a commercial cloud service. The company has focused on several areas: improving multi-tenancy efficiency with features like tenant offloading to S3, GPU-accelerated index building through the NVIDIA partnership, native multi-vector embeddings with MUVERA, more efficient quantization through RQ, and the rollout of agentic services such as the Query Agent, Transformation Agent, and Personalization Agent [22][24][32][43].
The vector database market has become increasingly competitive, with Weaviate, Qdrant, Milvus, Chroma, and Pinecone all vying for developer adoption, and incumbents such as Oracle, Microsoft Azure (with Azure AI Search), MongoDB, and PostgreSQL (via pgvector) adding vector capabilities to their existing databases. Weaviate differentiates itself through its combination of open-source availability, native hybrid search, automatic vectorization modules, and the GraphQL API. The open-core business model (free self-hosted, paid cloud) gives it flexibility that purely managed competitors like Pinecone cannot match, while the managed cloud option reduces friction for teams that do not want to operate their own infrastructure [50][51].
Weaviate's stated direction for 2026 emphasizes deeper agentic capabilities, including reasoning workflows, broader multimodal support, model evaluation tools, and shared memory across agents. CEO Bob van Luijt has described "agentic architectures" and "generative feedback loops" as the next inflection point in data management, comparable in his view to the arrival of the public cloud or the modern web [58][59].
Weaviate's focus on developer experience, with features like built-in generative search, automatic vectorization, integrated agents, and BYOC deployments, reflects a broader trend in the vector database space toward reducing the amount of glue code required to build AI applications. Rather than forcing developers to stitch together separate embedding services, vector stores, and LLM orchestration layers, Weaviate aims to handle much of this within the database itself.