Haystack is an open-source AI orchestration framework developed by deepset, a Berlin-based company, for building production-ready natural language processing (NLP), retrieval-augmented generation (RAG), and AI applications in Python. Licensed under Apache 2.0, Haystack lets developers compose modular pipelines from interchangeable components such as retrievers, generators, embedders, and document stores, connecting to model providers like OpenAI, Anthropic, Cohere, Hugging Face, and local inference engines. The framework has accumulated over 24,000 stars on GitHub and powers applications at organizations including Airbus, Siemens, and The Economist.
Milos Rusic, Malte Pietsch, and Timo Moller co-founded deepset in Berlin, Germany in June 2018. Rusic and Pietsch had met while studying at the Technical University of Munich, and Pietsch and Moller both came from Plista, an adtech startup where they worked on AI-powered ad creation. The three founders were inspired by the Transformer architecture that Google had introduced in 2017, and they bootstrapped the company by training custom NLP models for enterprise clients, tailoring BERT language models to domain-specific tasks.
In July 2019, deepset released FARM (Fast & easy transfer learning for NLP), an open-source library that simplified transfer learning with Transformer models. FARM provided tools for domain adaptation and fine-tuning with features like gradient accumulation, cross-validation, and mixed-precision training. Four months later, in November 2019, deepset released the first version of Haystack, designed as a higher-level framework that combined document retrieval with question answering capabilities. FARM's core modeling features were eventually integrated into Haystack, and FARM's standalone development was discontinued in November 2021.
deepset has raised approximately $45.6 million across three funding rounds:
| Round | Date | Amount | Lead Investor | Other Investors |
|---|---|---|---|---|
| Pre-Seed | March 2021 | $1.6 million | System.One | Lunar Ventures |
| Series A | April 2022 | $14 million | GV (Google Ventures) | System.One, Lunar Ventures, Harpoon Ventures |
| Series B | August 2023 | $30 million | Balderton Capital | GV, System.One, Lunar Ventures, Harpoon Ventures |
The Series B round was announced on August 9, 2023, and came amid growing enterprise demand for large language model (LLM) tooling.
The original Haystack architecture (versions 1.x) was built around a pipeline-based NLP system where data flowed through a sequence of nodes. Each node performed a specific task: preprocessing documents, retrieving relevant passages, reading text to extract answers, or summarizing content. Pipelines in version 1.x were implemented as directed acyclic graphs (DAGs), meaning data moved in one direction from start to finish without looping back.
The three main building blocks in Haystack 1.x were:
Document Stores stored documents and their vector representations. Supported backends included Elasticsearch, OpenSearch, FAISS, and SQL databases. Document Stores could be used as the last node in an indexing pipeline or passed directly to a Retriever.
Retrievers filtered large document collections down to a manageable set of candidates relevant to a given query. Different Retriever implementations handled sparse retrieval (BM25), dense retrieval (using embeddings), and hybrid approaches.
Readers performed extractive question answering by scanning the documents returned by a Retriever and identifying the exact span of text that answered the question. Readers typically used fine-tuned Transformer models.
This Retriever-Reader pattern became Haystack's signature approach to question answering and remained the dominant paradigm through the 1.x release line. The farm-haystack Python package distributed versions 1.x. Haystack 1.x reached end-of-life on March 11, 2025, and no longer receives updates.
As the AI field shifted from extractive QA toward generative approaches powered by LLMs, the Haystack team concluded that incremental changes to the 1.x architecture would not be enough. The original node-based system made assumptions about data flow that did not accommodate new patterns like prompt engineering, multi-step generation, or agentic loops. deepset announced the Haystack 2.0 redesign in mid-2023, released a beta in December 2023, and shipped the stable 2.0.0 release on March 11, 2024, distributed through the new haystack-ai package on PyPI.
Haystack 2.0 is built on two foundational abstractions: the Component protocol and the Pipeline object.
The Component protocol defines a standard API that any Python class must follow to participate in a pipeline. Every component implements a run() method that accepts typed inputs and returns typed outputs. Components explicitly declare the names and types of their inputs and outputs, which allows the pipeline to validate connections before execution and generate clear error messages when something is misconfigured.
The Pipeline object is a directed multigraph that connects components together and handles execution. Unlike the DAG-only approach of 1.x, Haystack 2.0 pipelines support cycles. Combined with router components and conditional logic, this means pipelines can branch, merge, and loop, enabling patterns like iterative refinement, self-correction, and autonomous agent behavior. The Pipeline also handles serialization (to and from YAML), connection validation, and runtime introspection.
The 2.0 redesign was guided by four principles:
Haystack 2.0 organizes its components into several categories.
A DocumentStore provides persistent storage for Document objects along with their metadata and vector representations. All DocumentStore implementations must expose four methods: count_documents(), filter_documents(), write_documents(), and delete_documents(). Haystack ships with an InMemoryDocumentStore for prototyping and testing. Production deployments typically use one of the many third-party DocumentStore integrations:
| DocumentStore | Type | Maintained By |
|---|---|---|
| Elasticsearch | Search engine | deepset |
| OpenSearch | Search engine | deepset |
| Pinecone | Vector database | deepset |
| Weaviate | Vector database | deepset |
| Qdrant | Vector database | Qdrant |
| Chroma | Vector database | deepset |
| MongoDB Atlas | Document database | MongoDB |
| pgvector | PostgreSQL extension | deepset |
| AstraDB | Cloud database | DataStax |
| Neo4j | Graph database | Neo4j |
| Marqo | Tensor search | Community |
| Milvus | Vector database | Community |
Retrievers query a DocumentStore and return the most relevant documents for a given input. In Haystack 2.0, each Retriever is specialized for its corresponding DocumentStore rather than implementing a generic interface. This means an ElasticsearchBM25Retriever handles all the specifics of querying Elasticsearch, while a QdrantEmbeddingRetriever handles Qdrant's particular API. This design gives each Retriever full access to the advanced features of its backend without being constrained by a lowest-common-denominator interface.
Haystack 2.0 also introduced multi-query retrieval components: MultiQueryTextRetriever runs multiple text queries in parallel, and MultiQueryEmbeddingRetriever does the same with embedding-based search. These components pair with a QueryExpander that generates semantically similar variations of the original query, improving recall for short or ambiguous inputs.
Generators produce text given a prompt. They are the primary interface to LLMs within Haystack. The framework distinguishes between two types:
OpenAIGenerator, HuggingFaceLocalGenerator) accept a plain text prompt and return generated text.OpenAIChatGenerator, AnthropicChatGenerator) work with a list of chat messages and support features like tool calling and system prompts.Generators exist for every major model provider, and switching from one to another requires changing only the Generator component in the pipeline while leaving the rest intact.
Embedders encode data (text, images, or other modalities) into vector representations. Haystack 2.0 separated embedding into its own component category, decoupled from Retrievers. There are two main types:
SentenceTransformersDocumentEmbedder) compute embeddings for Document objects during indexing.OpenAITextEmbedder) compute embeddings for query strings at query time.This separation means the same Retriever can be used with different embedding models simply by swapping the Embedder component.
Converters transform files of various formats into Haystack Document objects. Built-in converters handle plain text, HTML, PDF, DOCX, CSV, and other common formats. For example, HTMLToDocument parses HTML files, PyPDFToDocument extracts text from PDFs, and TextFileToDocument reads plain text files. These converters are typically the first step in an indexing pipeline.
Routers direct data flow within a pipeline based on conditions. The ConditionalRouter evaluates user-defined conditions and sends data to different branches accordingly. The LLMMessagesRouter uses pattern matching on LLM output to route messages. Routers enable conditional branching and, when combined with cycles in the pipeline graph, support iterative agent-like behavior.
The PromptBuilder component uses the Jinja2 template engine to construct prompts dynamically. Given a template with variables, it fills in values at runtime and produces the final text sent to a Generator. Templates can contain conditional logic, loops, and filters, giving developers fine-grained control over prompt construction without writing custom code.
The most common use case for Haystack is building RAG systems, which combine information retrieval with text generation to answer questions grounded in specific documents.
An indexing pipeline prepares documents for retrieval. A typical setup includes:
A query pipeline answers user questions by retrieving relevant documents and generating a response. A basic RAG query pipeline connects three components:
More advanced configurations add a Ranker between the Retriever and PromptBuilder to re-score and filter retrieved documents, or include a DocumentJoiner to merge results from multiple Retrievers (for example, combining BM25 keyword search with semantic embedding search in a hybrid retrieval setup).
Haystack added first-class support for AI agents through the [Agent](/wiki/agent) component. The Agent implements a loop-based system that uses a ChatGenerator and a set of tools to solve complex queries iteratively. At each step, the Agent analyzes the current state, decides whether to call a tool, processes the tool's output, and determines whether to continue or return a final answer.
Tools can be created in three ways:
Tool class, which wraps a callable with a name and description.ComponentTool class, which wraps any Haystack component as a tool.@tool decorator, which converts a plain Python function into a tool using its function name and docstring.The ToolInvoker component parses tool calls from the LLM's output and executes them with the correct arguments. A Toolset groups multiple tools together for the Agent to use.
Developers who need more control can build custom agent pipelines by connecting a ChatGenerator, ConditionalRouter, and ToolInvoker manually, which allows for arbitrary routing logic and multi-agent coordination.
Haystack's integration ecosystem includes over 100 packages connecting the framework to external services and tools. deepset maintains many of these integrations directly, while others come from partner companies and community contributors.
Haystack provides Generator and Embedder components for all major model providers:
| Provider | Generator Component | ChatGenerator Component |
|---|---|---|
| OpenAI | OpenAIGenerator | OpenAIChatGenerator |
| Anthropic | AnthropicGenerator | AnthropicChatGenerator |
| Google AI | GoogleAIGeminiGenerator | GoogleAIGeminiChatGenerator |
| Cohere | CohereGenerator | CohereChatGenerator |
| Mistral AI | MistralGenerator | MistralChatGenerator |
| Hugging Face | HuggingFaceLocalGenerator | HuggingFaceLocalChatGenerator |
| Ollama | OllamaGenerator | OllamaChatGenerator |
| Azure OpenAI | AzureOpenAIGenerator | AzureOpenAIChatGenerator |
| Amazon Bedrock | AmazonBedrockGenerator | AmazonBedrockChatGenerator |
Local model support through Hugging Face Transformers and Ollama means pipelines can run entirely on-premises without sending data to external APIs.
Beyond model providers, the integration ecosystem covers:
Haystack pipelines can be serialized to YAML using the dumps() and dump() methods, and deserialized back to Python with loads() and load(). Serialization delegates to each component individually, so custom components that follow the protocol are automatically serializable. Secrets (such as API keys) are handled separately to avoid storing sensitive values in plain text. YAML-based pipeline definitions make it possible to version-control pipeline configurations, run experiments by tweaking parameters in a config file, and reproduce results without modifying source code.
Hayhooks is a companion project that turns any Haystack pipeline into a REST API with a single command. It auto-generates OpenAPI (Swagger) documentation, supports streaming responses, and can produce OpenAI-compatible chat endpoints for integration with tools like Open WebUI. Hayhooks runs as a FastAPI application and can be containerized with Docker and deployed on Kubernetes for production workloads. Developers can add authentication, custom logging, and additional endpoints as needed.
Haystack supports structured logging and OpenTelemetry tracing out of the box. Integrations with Datadog, Langfuse, and other monitoring platforms allow teams to track pipeline performance, token usage, and error rates in production.
In April 2022, deepset launched deepset Cloud, its commercial managed platform. In 2025, the product was rebranded as the Haystack Enterprise Platform (also referred to as the deepset AI Platform). The platform provides a managed environment for building, deploying, and scaling Haystack-based applications without managing infrastructure directly.
The platform supports multiple deployment models: fully managed SaaS, Virtual Private Cloud (VPC), on-premises, and air-gapped environments. It includes pipeline templates, visual pipeline editing, built-in evaluation tools, and access to deepset's engineering team for support.
Pricing follows a structure based on platform licensing, agent or application runtime, and optional expert services. A free Studio tier is available for prototyping. Enterprise pricing is custom and available on request. The platform is also listed on the AWS Marketplace.
Haystack competes primarily with LangChain and LlamaIndex in the LLM application framework space. Each framework takes a different approach to the same general problem.
| Feature | Haystack | LangChain | LlamaIndex |
|---|---|---|---|
| Developer | deepset (Berlin) | LangChain, Inc. | LlamaIndex, Inc. |
| First release | November 2019 | October 2022 | November 2022 |
| License | Apache 2.0 | MIT | MIT |
| Primary language | Python | Python, JavaScript/TypeScript | Python, TypeScript |
| GitHub stars (approx.) | ~24,500 | ~70,000 | ~42,000 |
| Architecture | Directed multigraph pipelines with cycles | Chain/agent abstractions; LangGraph for graph-based workflows | Index-centric with query engines |
| Primary strength | Production-grade search and RAG | Rapid prototyping, large integration ecosystem | Complex data ingestion and indexing |
| Agent support | Built-in Agent component with tool calling | LangGraph for stateful agent workflows | Agent framework with tool integration |
| Pipeline serialization | YAML | JSON (LangGraph) | JSON |
| Commercial offering | Haystack Enterprise Platform (SaaS/on-prem) | LangSmith (observability), LangGraph Platform | LlamaCloud |
| Typical overhead per query | ~5.9 ms | ~10 ms | ~6 ms |
Haystack's main differentiator is its explicit, graph-based pipeline architecture and its focus on production deployments with built-in serialization, observability, and enterprise support. LangChain offers the largest ecosystem of integrations and third-party tools, making it popular for prototyping. LlamaIndex excels at connecting LLMs to diverse data sources through its extensive set of data connectors (over 150).
Several large organizations use Haystack in production:
deepset reports that enterprise customers using the platform have achieved up to 5x ROI and 40% efficiency gains in document processing workflows.
Haystack is developed in the open on GitHub under the Apache 2.0 license. The project accepts contributions ranging from bug fixes and documentation improvements to new components and integrations. deepset maintains contributor guidelines and a public roadmap.
Community channels include a Discord server, a discussion forum on GitHub, and a newsletter. deepset also publishes tutorials (many as runnable Colab notebooks), a cookbook of ready-to-use recipes, and extensive documentation covering both getting-started guides and advanced topics.
The framework requires Python 3.10 or later (following the end-of-life of Python 3.9 in October 2025).
Haystack has maintained a rapid release cadence, with new minor versions shipping roughly every two to three weeks.
| Version | Date | Notable additions |
|---|---|---|
| 2.18.0 | September 2025 | Agent breakpoints and stepwise debugging |
| 2.19.0 | October 2025 | QueryExpander, MultiQueryTextRetriever, MultiQueryEmbeddingRetriever |
| 2.20.0 | November 2025 | Sparse embedding support via SentenceTransformersSparseTextEmbedder |
| 2.21.0 | December 2025 | Resume Agent from AgentSnapshot; new breakpoint controls |
| 2.22.0 | January 2026 | Performance and stability improvements |
| 2.23.0 | January 2026 | Additional pipeline validation features |
| 2.24.0 | February 2026 | Extended agent configuration options |
| 2.25.0 | February 2026 | SearchableToolset to reduce context usage; Jinja2 templates in Agents |
| 2.26.0 | March 2026 | LLMRanker for high-quality context; Jinja2 system prompts in Agents |
Key themes across recent releases include stronger agent capabilities (tool management, breakpoints, state snapshots), improved retrieval quality (multi-query and sparse embedding support), and more flexible prompt engineering within agent workflows.