LlamaIndex

Developer Tools Information Retrieval Natural Language Processing Open Source AI

30 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

34 citations

Revision

v9 · 6,081 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

LlamaIndex is an open-source data framework for building large language model (LLM) applications, with a particular focus on retrieval-augmented generation (RAG) and document processing. Created by Jerry Liu and first released as GPT Index in November 2022, it provides the infrastructure for connecting LLMs with custom data sources, letting developers build production AI applications that query private documents, databases, and APIs. By mid-2026 the project had repositioned itself as "the leading document agent and OCR platform," with the main GitHub repository holding roughly 50,200 stars and 7,600 forks, the current Python release at v0.14.22 (May 14, 2026), and the commercial LlamaCloud platform organized around document parsing, structured extraction, indexing, and agent deployment ^[1]^[22]^[29].

When was LlamaIndex released and who founded it?

LlamaIndex began as a side project in November 2022, when Jerry Liu started experimenting with ways to help large language models reason over proprietary data outside their training sets. Liu, a Princeton graduate (B.S.E. in Computer Science, summa cum laude, 2017) and former research scientist at Uber, had been trying to build a sales bot but ran into difficulty feeding proprietary data to GPT-3 because of its context window limits. As he later described it, "LlamaIndex started as a toy open source project in November 2022," born from his interest in "how large language models (LLMs) could be used on top of proprietary data outside their training set" ^[14]. That frustration led him to create a simple tool called GPT Index, which he released with his first commit and tweet on November 8, 2022 ^[2].

What started as a basic tree index for organizing information quickly attracted developer attention as the broader AI community grappled with the challenge of grounding LLM outputs in real-world, private data. Early feature releases in December 2022 included support for indexing embeddings with vector stores, and initial data loaders for Notion, Slack, and Google Drive. By January 2023, GPT Index had hit the GitHub trending page for the first time ^[3].

After two months of working on the project, Liu teamed up with Simon Suo, his former colleague at Uber's Advanced Technologies Group (ATG), to build the product and community. Suo, who studied at the University of Toronto and had spent five years at Uber ATG and Waabi working on automated driving research under Raquel Urtasun, became co-founder and CTO ^[4].

The project was renamed from GPT Index to LlamaIndex in early 2023 to reflect its broader applicability beyond OpenAI models. The name references the llama, a nod to the growing ecosystem of open-source LLMs (including Meta's LLaMA family), while "Index" emphasizes the framework's core strength in data indexing and retrieval.

Jerry Liu

Jerry Liu is the co-founder and CEO of LlamaIndex. Before founding the company, Liu built a career spanning machine learning research and engineering across several notable organizations:

Period	Role	Organization
2014	Software Engineering Intern	Apple
2015	Software Engineering Intern	Quora
2016	Software Engineering Intern	Two Sigma
2017-2018	Machine Learning Engineer	Quora
2018-2019	AI Resident	Uber
2019-2021	Research Scientist	Uber
2021-2023	ML Engineering Manager	Robust Intelligence
2023-present	Co-Founder and CEO	LlamaIndex

At Princeton, Liu earned the Shapiro Prize for Academic Excellence (2014) and graduated with a 3.97 GPA. His research experience at Uber focused on machine learning for autonomous driving, while at Robust Intelligence he led the ML monitoring team, gaining experience in building reliable AI systems. This combination of research depth and engineering leadership shaped the design philosophy behind LlamaIndex ^[5].

Simon Suo

Simon Suo is the co-founder and CTO of LlamaIndex. Suo attended the University of Toronto and worked at Uber ATG as a Research Scientist (2018-2021) before joining Waabi as a Senior Research Scientist (2021-2023), where he contributed to automated driving research. His background in large-scale machine learning systems and autonomous driving infrastructure informs LlamaIndex's technical architecture ^[4].

Is LlamaIndex open source?

Yes. The core LlamaIndex framework is open source under the permissive MIT license, available in both Python (run-llama/llama_index) and TypeScript (run-llama/LlamaIndexTS). As of mid-2026 the main Python repository had accumulated roughly 50,200 GitHub stars and 7,600 forks, supported by more than 300 community integration packages ^[1]^[22]. The open-source framework is the foundation for the commercial LlamaCloud platform, which adds managed parsing, extraction, indexing, and agent deployment as paid services. Several individual components, including the Workflows orchestration engine and the LiteParse local parser, are also released as standalone open-source packages.

Core Concepts

LlamaIndex provides a layered architecture that handles the full pipeline from raw data to LLM-powered responses. The framework is organized around several core abstractions that work together to enable sophisticated data-augmented AI applications.

Documents

A Document is the foundational data unit in LlamaIndex. It serves as a generic container around any data source, whether that is a PDF file, an API response, a database row, or a web page. Documents can be constructed manually by the developer or created automatically through data loaders. Each Document stores the raw text content along with two important attributes ^[6]:

Metadata: A dictionary of annotations (such as file name, creation date, or source URL) that can be appended to the text and used during retrieval and filtering.
Relationships: A dictionary of connections to other Documents or Nodes, preserving the structural context of the original data source.

Nodes

A Node represents a "chunk" of a source Document. Nodes are first-class citizens in LlamaIndex and form the basic unit that gets indexed and retrieved. When data is loaded into the framework, it is typically broken down into Node objects through a process called parsing or chunking. Each Node encapsulates ^[6]:

Text: The actual text content extracted from the source document for this chunk.
Metadata: Supplementary information inherited from the parent Document, plus any chunk-specific metadata.
Relationships: Pointers linking the node to other nodes (such as previous/next node in a sequence) or back to the source Document, maintaining structural integrity.

By default, every Node derived from a Document inherits the same metadata from that Document. Developers can customize the chunking strategy through NodeParser classes, choosing between fixed-size chunking, sentence-based splitting, semantic chunking, or custom approaches.

Indices

Once data is ingested and parsed into Nodes, LlamaIndex organizes them into index structures optimized for different retrieval patterns. The framework offers several indexing strategies:

Index Type	Description	Best For
Vector Store Index	Generates semantic embeddings for each Node and stores them for similarity search	Semantic similarity search
Summary Index	Creates summaries of document chunks	Broad question answering
Tree Index	Organizes data in a hierarchical tree structure	Multi-level summarization
Keyword Table Index	Extracts keywords and maps them to Nodes	Exact keyword matching
Knowledge Graph Index	Builds a knowledge graph from extracted entities and relationships	Relationship-based queries

The Vector Store Index is the most commonly used, as it powers the standard RAG workflow. It splits documents into Nodes, generates embeddings using a configurable embedding model, and stores them in a vector database for fast similarity search. LlamaIndex integrates with dozens of vector store backends including Pinecone, Weaviate, Chroma, Qdrant, Milvus, and FAISS.

Retrievers

Retrievers are responsible for fetching the most relevant Nodes from an index given a user query. LlamaIndex provides several retriever types to support different search strategies ^[7]:

Retriever Type	Description
Vector Index Retriever	Finds Nodes by semantic similarity using embedding cosine distance
Keyword Retriever	Matches Nodes based on keyword frequency (BM25)
Hybrid Retriever	Combines vector and keyword retrieval with configurable alpha weighting
Router Retriever	Uses an LLM to decide which sub-retriever to route the query to
Auto-Merging Retriever	Merges small child Nodes into larger parent Nodes when enough children are retrieved
Recursive Retriever	Traverses node relationships recursively to gather broader context

Hybrid retrieval, which blends semantic vector search with keyword-based BM25 search, is particularly effective in production settings. LlamaIndex exposes an alpha parameter that controls the balance between the two methods, allowing developers to tune retrieval behavior for their specific domain.

Query Engines

Query engines combine a Retriever and a Response Synthesizer into an end-to-end pipeline. When a user submits a question, the query engine retrieves relevant Nodes from the index and uses an LLM to synthesize a coherent answer. LlamaIndex provides several query engine types ^[8]:

Query Engine Type	Description
Simple Query Engine	Retrieves top-k relevant Nodes and synthesizes a response in a single pass
Sub-Question Query Engine	Breaks complex questions into sub-questions, retrieves answers for each, and combines them
Router Query Engine	Routes queries to different indices or retrieval strategies based on query content
SQL Query Engine	Translates natural language questions into SQL queries for structured databases
Pandas Query Engine	Enables natural language queries over dataframes
Citation Query Engine	Provides source citations alongside generated answers

Query engines can be composed and chained together. For instance, a router engine might send factual questions to a vector index while directing analytical questions to a SQL engine, combining the results into a unified response.

Response Synthesizers

Response synthesizers take the retrieved Nodes and the original query, then produce a natural language answer using an LLM. LlamaIndex offers multiple synthesis strategies:

Refine: Iterates through each retrieved Node, progressively refining the answer with each new piece of context.
Compact: Similar to Refine, but first compacts the Nodes to fit within the LLM context window, reducing the number of LLM calls.
Tree Summarize: Builds a tree of summaries from the retrieved Nodes and returns the root summary.
Simple Summarize: Concatenates all retrieved Node text and passes it to the LLM in a single call.

RAG Pipeline

Retrieval-augmented generation is the core use case that LlamaIndex was built to support. A typical RAG pipeline in LlamaIndex follows five stages ^[8]:

Loading: Data is ingested from source systems using data connectors (LlamaHub loaders) and converted into Document objects.
Parsing/Chunking: Documents are split into Nodes using a NodeParser, with configurable chunk size, overlap, and splitting strategy.
Indexing: Nodes are embedded using an embedding model and stored in a vector index (or other index type).
Retrieval: When a query arrives, the retriever finds the most relevant Nodes from the index.
Synthesis: The retrieved Nodes and the user query are passed to an LLM, which generates a grounded, contextual response.

This five-stage pipeline can be set up in just a few lines of code for simple use cases, but each stage is fully customizable for production deployments. Developers can swap embedding models, vector stores, chunking strategies, retrieval methods, and LLM providers without changing the overall pipeline structure.

Advanced RAG Techniques

Beyond the basic pipeline, LlamaIndex supports several advanced RAG patterns:

Sentence Window Retrieval: Retrieves individual sentences but expands the context window to include surrounding sentences before synthesis.
Parent-Child Retrieval: Indexes small chunks for precise matching but retrieves larger parent chunks for richer context.
Recursive Retrieval: Follows node relationships to gather context from multiple related sections.
Hybrid Search: Combines dense vector embeddings with sparse BM25 keyword matching for higher recall.
Reranking: Applies a cross-encoder or reranker model to re-score retrieved Nodes after initial retrieval, improving precision.

Agents

LlamaIndex agents extend the framework beyond simple retrieval into autonomous reasoning and action. An agent can use tools (including query engines, APIs, and custom functions) to accomplish complex tasks that require multiple steps of reasoning ^[9].

Agents in LlamaIndex support:

Tool use: Calling external APIs, executing code, or querying databases
Planning: Breaking down complex tasks into sequential steps
Memory: Maintaining conversation history and context across interactions
Reflection: Evaluating the quality of their own outputs and retrying when needed

The framework provides built-in agent types compatible with various LLM providers, as well as a lower-level agent protocol for building custom agent architectures.

Agentic RAG

Agentic RAG represents an evolution of the basic RAG pipeline where autonomous agents handle the retrieval and synthesis process. Instead of a fixed retrieve-then-generate pipeline, an agentic RAG system can dynamically decide what to retrieve, when to retrieve, and how to synthesize the final answer ^[10].

LlamaIndex's agentic RAG architecture supports several patterns:

Document Agents: Each document in a corpus receives its own specialized agent that can both search via embeddings and summarize content. A top-level meta-agent coordinates across these document agents to answer user queries.
Router Agent: An agent that classifies incoming queries and routes them to the appropriate retrieval strategy (vector search, keyword search, SQL query, or API call).
Multi-Step Query Decomposition: The agent breaks a complex question into sub-questions, retrieves answers for each, and combines them into a comprehensive response.
Self-Correcting Retrieval: The agent evaluates whether the retrieved context adequately answers the query and performs additional retrieval rounds if needed.

This architecture scales naturally as new documents are added, since each receives a dedicated sub-agent without requiring changes to the overall system design.

Long-Horizon Document Agents

In February 2026, LlamaIndex articulated a vision for "long-horizon document agents," described as agent loops that can solve document tasks end-to-end with minimal human input and continue making progress over time through event triggers rather than chat messages. The architecture combines three elements: event-triggered workflows that react to redlines, comments, template updates, or deadline approaches; a persistent task backlog that queues work, prioritizes items, and supports escalation; and an inbox-style interface that exposes task states (for example, "Draft PRD v0 created, waiting for review") with references and artifacts attached to each item ^[23].

The company positions long-horizon agents as the natural extension of Workflows: where short-horizon agents respond synchronously to chat prompts, long-horizon agents are intended to maintain living documents, iterate on product requirements docs, and manage contract redlines across days or weeks. As of early 2026 LlamaIndex acknowledged that the technology remained farther from broad availability than shorter-horizon agentic patterns, but it serves as a roadmap for the platform's 2026 agent direction ^[23].

Workflows

Introduced as a beta feature in August 2024 and released as a stable 1.0 version in June 2025, Workflows is an event-driven, async-first orchestration engine that allows developers to define multi-step pipelines where each step can involve retrieval, generation, tool calls, or custom logic ^[11].

Workflows support branching, looping, conditional execution, and parallel processing, making them suitable for complex enterprise automation scenarios. The architecture uses typed events that flow between steps, with each step decorated to specify which events it accepts and emits.

Key features of Workflows 1.0 include:

Feature	Description
Typed Workflow State	Full type safety for state management in both Python and TypeScript
Resource Injection	Dynamic injection of database clients and external resources at runtime
Observability	Integration with OpenTelemetry, Arize Phoenix, and other monitoring tools
Stateful Execution	Workflows can start, pause, and resume across sessions
Standalone Package	Available as `llama-index-workflows` (Python) and `@llamaindex/workflow-core` (TypeScript)

Workflows have grown significant enough to become their own dedicated package, separate from the core LlamaIndex library. They are the recommended abstraction for building complex agentic systems, replacing earlier chain-based patterns ^[11].

Agent Client Protocol Integration

In January 2026, LlamaIndex shipped a deep integration between Workflows and the Agent Client Protocol (ACP), bundling a fully customizable agent that includes filesystem tools, a bash execution tool, MCP server connectors, persistent memory, and a built-in TODO tracker ^[24]. The integration is paired with llamactl, a command line tool that lets developers launch document agents from pre-built templates with a single command. The shipped templates cover use cases ranging from simple document Q&A to invoice processing, and each template includes a reference UI alongside the Workflows logic ^[24].

LlamaHub (Data Connectors)

LlamaHub is the community-driven ecosystem of data connectors, tools, and integrations for LlamaIndex. Originally maintained as a separate GitHub repository, LlamaHub was merged into the core LlamaIndex Python repository starting with version 0.10 in early 2024. The integrations are browsable at llamahub.ai ^[12].

Data connectors pull information from various sources and transform raw data into a standardized Document representation. LlamaHub provides loaders for over 300 integration packages:

Source Category	Examples
Cloud Storage	Amazon S3, Google Drive, Dropbox
Productivity Tools	Notion, Slack, Confluence, Google Docs
Databases	PostgreSQL, MySQL, MongoDB, Snowflake
File Formats	PDF, CSV, DOCX, EPUB, HTML, Markdown
APIs	Twitter, Wikipedia, YouTube, GitHub
Enterprise Systems	Salesforce, Hubspot, Jira, Zendesk
Vector Stores	Pinecone, Weaviate, Chroma, Qdrant, Milvus
LLM Providers	OpenAI, Anthropic, Cohere, Google Gemini, Mistral

Each connector handles the specifics of authentication, pagination, and data extraction for its respective source, presenting a consistent interface to the rest of the framework. Developers can also write custom connectors for proprietary data sources.

How does LlamaIndex differ from LangChain?

LlamaIndex and LangChain are the two most prominent frameworks for building LLM applications, and they are frequently compared. While there is overlap in their capabilities, each framework has a distinct focus: LlamaIndex centers on data indexing, retrieval, and document processing, whereas LangChain centers on general-purpose chaining and workflow orchestration ^[13].

Aspect	LlamaIndex	LangChain
Primary Focus	Data indexing, retrieval, and document processing (RAG)	General-purpose LLM application building and workflow orchestration
Core Strength	Deep data connection, indexing quality, and retrieval precision	Flexible chaining of models, tools, and complex workflows
Learning Curve	Gentler; high-level abstractions for common RAG patterns	Steeper; more configuration options and concepts
Data Connectors	300+ via LlamaHub	Fewer built-in, but extensible
Agent Framework	Workflows (event-driven, async-first)	LangGraph (graph-based state machines)
Agent Capabilities	Growing; focused on document-centric and retrieval agents	Mature; broad tool and agent ecosystem
Retrieval Accuracy	35% boost reported in 2025 through advanced retrieval techniques	Strong, with extensive retriever options
Commercial Product	LlamaCloud (managed RAG, parsing, agent deployment)	LangSmith (observability, evaluation, monitoring)
Best For	Document-heavy RAG applications, enterprise knowledge systems, document AI	Complex multi-step workflows, tool-heavy agents, chatbot applications
License	MIT	MIT
Community Size	~50,200 GitHub stars	~100,000+ GitHub stars

LlamaIndex is generally considered the better choice for straightforward RAG applications, especially those involving large volumes of documents where indexing quality and retrieval precision are paramount. LangChain excels when the application requires chaining multiple models, tools, and APIs into complex workflows that go beyond data retrieval ^[13].

In practice, many developers use both frameworks together. LlamaIndex handles data ingestion and retrieval, while LangChain orchestrates the broader application logic. The two frameworks are interoperable, with LlamaIndex query engines usable as tools within LangChain agents.

LlamaCloud

LlamaCloud is the commercial managed service built on top of the LlamaIndex open-source framework. Launched with general availability in March 2025 alongside the Series A announcement, it provides a cloud-hosted platform for building and deploying data agents and RAG applications without managing infrastructure ^[14]. By mid-2026, the platform's official description had condensed to three pillars: Parse, Extract, and Index, with agent deployment and workflow orchestration handled through Workflows and LlamaAgents on top of those primitives ^[22].

LlamaParse

LlamaParse is LlamaCloud's flagship document parsing service, designed to handle complex document formats that simple text extraction cannot process accurately. It uses AI-powered parsing to extract structured content from PDFs, presentations, spreadsheets, and other document types, preserving tables, headers, and layout information ^[15].

LlamaParse v2, launched in 2025, introduced a simplified four-tier configuration system and up to 50% cost reduction compared to v1:

Tier	Description	Use Case
Fast	Quick extraction with basic formatting	High-volume simple documents
Cost Effective	Balanced accuracy and cost	Standard business documents
Agentic	AI-powered layout understanding	Complex documents with tables and figures
Agentic Plus	Maximum accuracy with multi-model approach	Critical documents requiring highest fidelity

The v2 redesign replaced complex configuration modes with automatic model routing, making it simpler for developers to get accurate parsing results without manual tuning. In March 2026, Agentic Plus added precise visual grounding with bounding box citations, allowing downstream agents to locate the exact region of a source page that produced an extracted value, including handwriting, infographics, and LaTeX formulas ^[25]. On the company's own ParseBench evaluation, the Agentic tier reached 84.9% overall accuracy at roughly 1.2 cents per page, which LlamaIndex reported as outperforming every other provider tested at any cost level ^[27].

LiteParse

LiteParse is an open-source local document parser that LlamaIndex released and then officially integrated into the broader LlamaIndex ecosystem in April 2026. It distills patterns learned during LlamaParse's development into a lightweight library that runs without cloud dependencies, processes roughly 500 pages in two seconds, supports more than 50 file formats, and preserves layout while supporting local OCR and multimodal LLM backends ^[25]^[26]. LiteParse is intended for developers who need fast on-device parsing or who want to embed parsing inside applications without sending data to a managed cloud service. It accumulated more than 4,300 GitHub stars within weeks of release ^[26].

ParseBench

ParseBench, announced on April 13, 2026, is an open-source benchmark designed specifically to evaluate document OCR for AI agents. The benchmark contains roughly 2,000 human-verified enterprise document pages drawn from more than 1,200 publicly available documents across insurance, finance, government, and other domains, paired with over 167,000 rule-based tests ^[27]^[33]. Evaluation spans five dimensions: tables (using the TableRecordMatch metric), charts (ChartDataPointMatch), content faithfulness, semantic formatting, and visual grounding. LlamaIndex framed the benchmark around what it calls "semantic correctness," arguing that the bar for OCR had shifted from output that is "good enough for a human to read" to output that is "reliable enough for an agent to act on" ^[27].

In LlamaIndex's initial run, fourteen methods were evaluated, including general-purpose vision-language models such as GPT, Claude Haiku, Gemini, and Qwen variants, specialized document parsers including Textract, Azure Document Intelligence, and Google Cloud Document AI, and LlamaParse tiers. LlamaParse Agentic reached 84.9% overall accuracy and was the only method competitive across all five dimensions, while Agentic Plus on Anthropic Claude Opus 4.7 lifted chart parsing scores by 42.3% compared to prior baselines ^[27]^[25]. ParseBench is hosted as a public leaderboard on Kaggle and as a dataset on Hugging Face, with an accompanying arXiv paper, with the goal of making document OCR quality measurable on the dimensions that cause real failures for downstream agents ^[27]^[33].

LlamaAgents

LlamaAgents provides one-click deployment of document agents, allowing organizations to deploy AI agents that can process, analyze, and act on document content. Built on the open-source Workflows library, it supports persistent memory, filesystem tools, and integration with MCP (Model Context Protocol) servers ^[16]. Pre-built agent templates cover common use cases including invoice processing, contract review, claims handling, and document Q&A, and in January 2026 the templates collection expanded with one-command deployment via the llamactl CLI ^[24].

Additional LlamaCloud Services

Service	Description
LlamaSheets	Transforms disorganized spreadsheets into AI-ready data through intelligent region classification and multi-stage processing with 40+ cell features
LlamaSplit	Automatically segments bundled multi-document files into distinct sections by analyzing page content and grouping consecutive pages by category
LlamaExtract	Structured extraction service for pulling specific data fields from documents, with a Table Row Mode for handling repeating entities like catalogs
Managed Index	Managed ingestion and RAG service with hosted vector stores

LlamaCloud can be deployed as a SaaS installation or within a virtual private cloud, and includes enterprise features such as role-based access control and single sign-on.

Pricing

LlamaCloud uses a credit-based pricing model in which 1,000 credits convert to $1.25. Different operations consume different credit amounts. For example, LlamaExtract Premium mode costs 60 credits per page, which works out to roughly $7.50 to process a 100-page document at the headline rate ^[22]^[28]. The published tiers as of 2026 are:

Plan	Monthly Price	Included Credits	Notable Limits
Free	$0	10,000 credits	1 user, 1 project, 5 indexes, 50 files per index, basic support
Starter	$50	40,000 credits	Up to 5 users, pay-as-you-go up to 400,000 credits, basic email support
Pro	$500	400,000 credits	Up to 10 users, pay-as-you-go up to 4,000,000 credits, Slack support
Enterprise	Custom	Negotiated volume discount	5x higher rate limits, SSO, dedicated account manager, VPC and on-prem options

All tiers include access to every product module (Parse, Extract, Classify, Split, Index), with differences concentrated in volume, concurrency, support level, and deployment options ^[22]^[28].

Agentic Document Workflows

In 2025, LlamaIndex introduced the concept of Agentic Document Workflows (ADW), an architecture combining document processing, retrieval, structured outputs, and agentic orchestration to enable end-to-end knowledge work automation ^[16]. An ADW system can:

Maintain state across processing steps
Apply business rules to document content
Coordinate different processing components
Take autonomous actions based on document content

This approach represents a shift from passive document retrieval toward active document processing, where AI agents can handle tasks like invoice processing, contract review, compliance checking, and report generation with minimal human intervention. LlamaIndex reported achieving 90%+ pass-through rates with agentic workflows, compared to 60-70% with legacy systems ^[17].

Technical Architecture

LlamaIndex is available in both Python and TypeScript, with the Python library being the more mature and feature-complete implementation. The framework follows a modular design that allows developers to swap components at every layer. The current Python release line is the 0.14.x series, with v0.14.22 shipped on May 14, 2026; the v0.13 to v0.14 transition removed deprecated agent classes and consolidated several integration packages ^[29].

Core Abstractions

Abstraction	Purpose
Document	Raw data unit from a connector
Node	Chunk of a Document with metadata and relationships
Index	Data structure for organizing and retrieving Nodes
Retriever	Fetches relevant Nodes from an Index given a query
Response Synthesizer	Generates a natural language response from retrieved Nodes
Query Engine	Combines Retriever and Synthesizer into an end-to-end pipeline
Agent	Autonomous entity that uses tools and reasoning to accomplish tasks
Workflow	Event-driven, multi-step orchestration pipeline with branching and state management

Integration Ecosystem

LlamaIndex integrates with a wide range of LLM providers, embedding models, and vector stores:

LLM Providers: OpenAI, Anthropic, Cohere, Google (Gemini), Mistral, local models via Ollama and llama.cpp
Embedding Models: OpenAI, Cohere, Hugging Face, local sentence transformers
Vector Stores: Pinecone, Weaviate, Chroma, Qdrant, Milvus, FAISS, pgvector
Observability: Arize Phoenix, Langfuse, Weights & Biases, OpenTelemetry

MCP Integration

LlamaIndex added native support for the Model Context Protocol (MCP) in 2025, allowing agents to connect to any MCP-compatible server and use the tools it exposes. The llama-index-tools-mcp package enables connecting to MCP servers in a single line of code. LlamaIndex also provides its own MCP server for documentation search, offering tools for searching, grepping, and reading docs directly from coding environments like Claude Code ^[18].

Partnerships and Agent Skills

In early 2026, LlamaIndex announced a collaboration with Google Developers in which LlamaParse is combined with Gemini 3.1 to power VLM-enabled agentic OCR for financial assistants and other document-heavy use cases ^[25]. Around the same time, the platform expanded the reach of LlamaParse by shipping official Agent Skills for more than 40 agent frameworks and IDEs, which embed parsing instructions for documents, tables, charts, and images directly into agent runtimes so that agents can call LlamaParse without bespoke glue code ^[25].

Knowledge Graphs

LlamaIndex can build and query knowledge graphs from unstructured text. The Knowledge Graph Index extracts entities and relationships from documents, constructing a graph that enables relationship-based queries. This is particularly useful for domains where connections between entities matter, such as biomedical research, legal analysis, and supply chain management.

Evaluation and Observability

LlamaIndex includes built-in evaluation tools for measuring retrieval quality and response accuracy. The framework integrates with observability platforms to provide tracing, logging, and performance monitoring for production RAG applications. Developers can track metrics like retrieval precision, response faithfulness, and latency. The Workflow Debugger, shipped in 2025, provides real-time event logs, workflow visualization, and run comparison capabilities ^[17]. ParseBench, released in April 2026, extends evaluation outward from RAG quality to the upstream document parsing layer, giving developers a way to compare LlamaParse against vision-language models and other parsers on a fixed enterprise dataset ^[27].

How much funding has LlamaIndex raised?

LlamaIndex has raised over $27.5 million in disclosed funding across multiple rounds, plus undisclosed strategic minority investments ^[14].

Round	Date	Amount	Lead Investor	Other Investors
Seed	June 2023	$8.5M	Greylock Partners	Angel investors including Jack Altman, Lenny Rachitsky, Mathilde Collin (Front), Raquel Urtasun (Waabi)
Series A	March 2025	$19M	Norwest Venture Partners	Greylock Partners (existing)
Strategic	May 2025	Undisclosed	N/A	Databricks Ventures, KPMG Ventures

The seed round in 2023 was used to build an enterprise offering on top of the open-source framework and to grow the team. The $19 million Series A, announced March 4, 2025 and led by Norwest Venture Partners, brought total disclosed funding to $27.5 million and was directed toward expanding the team (approximately 20 people at the time) and accelerating development of LlamaCloud and the agentic platform ^[14]^[31]. Norwest general partner Dave Zilberman explained the firm's thesis in the announcement: "LlamaIndex is compelling for Norwest as we've been closely studying enterprise requirements for AI adoption, specifically around data preparation, ingestion and custom agents as part of an ongoing investment thesis" ^[31]. The strategic investments from Databricks and KPMG reflect growing adoption in enterprise environments. KPMG's investment signals uptake within professional services, while Databricks' backing aligns LlamaIndex with one of the largest data and AI platforms ^[19]. As of May 2026, no Series B round has been announced; the Series A and the May 2025 strategic round remain the most recent funding events on the company's public record ^[30].

What is LlamaIndex used for?

LlamaIndex powers a variety of LLM application patterns:

Use Case	Description
Enterprise Knowledge Base	Employees query internal documents, wikis, and databases using natural language
Customer Support	AI agents retrieve product documentation and past tickets to answer customer questions
Research Assistant	Researchers query large corpora of academic papers, extracting findings and connections
Legal Document Review	Agents analyze contracts, extract clauses, and flag compliance issues
Financial Analysis	Natural language queries over financial reports, SEC filings, and market data
Code Documentation	Developers query codebases and documentation to understand systems and debug issues
Invoice Processing	Agentic workflows automatically extract, validate, and route invoice data
Compliance Checking	Document agents review policies and regulations against organizational documents

A representative case study is Jeppesen, a Boeing subsidiary, whose enterprise AI engineering team built a Unified Chatbot Framework (UCF) on top of LlamaIndex's event-driven agent workflows. A lean five to seven person team used the framework to cut the time to build and deploy a new internal agent from 512 hours to 64 hours per agent, an 87% reduction, with agents assembled from roughly 50 lines of code plus a JSON configuration file. LlamaIndex reported that the framework saved approximately 2,000 engineering hours and was onboarded across 10 to 11 production products used by internal teams at Boeing ^[22]^[32].

Current State (2025-2026)

As of mid-2026, LlamaIndex has evolved from a simple RAG library into a comprehensive platform for building document-centric AI applications. Several trends define its current trajectory:

Agents over retrieval: The framework has shifted emphasis from passive retrieval to active agents that can process, analyze, and act on document content. The tagline on the official site describes LlamaIndex as "the leading document agent and OCR platform" and reframes LlamaParse as "Document OCR for the agentic stack" ^[20]^[22].
LlamaCloud maturation: The commercial platform has consolidated around three product pillars (Parse, Extract, Index) layered with agent deployment (LlamaAgents) and orchestration (Workflows), plus auxiliary services such as LlamaSheets and LlamaSplit, creating a viable business model around the open-source framework ^[22].
Enterprise adoption: Investments from Databricks and KPMG signal growing adoption in enterprise environments. LlamaCloud's Enterprise tier introduces 5x higher rate limits, SSO, dedicated account management, and VPC or on-premises deployment options for large organizations ^[22]^[28].
Context engineering: LlamaIndex popularized the concept of "context engineering" in 2025, emphasizing the importance of optimizing how information is delivered to LLMs rather than relying solely on model capabilities ^[17]. By mid-2026 Jerry Liu had sharpened this into a competitive argument that, as agent loops grow more capable, "the competitive moat moves to context engineering," the curation and structuring of the data fed into models ^[34].
MCP and protocol support: Native integration with the Model Context Protocol and the Agent Client Protocol enables LlamaIndex agents to interoperate with a broader ecosystem of tools and services, while llamactl and Agent Skills make those agents callable from external runtimes ^[18]^[24]^[25].
Evaluation as a public good: With ParseBench in April 2026, LlamaIndex moved from publishing internal benchmarks to operating a public leaderboard that compares parsers and vision-language models on enterprise documents, hosted on Kaggle and Hugging Face ^[27].
SDK migration: In early 2026, LlamaIndex announced that the llama_cloud_services repository would be deprecated by May 2026, with users encouraged to migrate to new packages (llama-cloud>=1.0 for Python, @llamaindex/llama-cloud for TypeScript) offering improved performance and active development ^[21].
Community growth: The open-source framework has accumulated roughly 50,200 GitHub stars and 7,600 forks on the main repository, alongside more than 300 integration packages and a growing LiteParse community ^[22]^[26].

Jerry Liu has framed 2026 as the year "agents go from workflows to employees," suggesting that the next phase of LlamaIndex development will focus on long-horizon autonomous document processing agents that can handle complex knowledge work with minimal human oversight ^[16]^[23].

References

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

8 revisions by 1 contributors · full history

Suggest edit