RAPTOR

AI Agents Machine Learning

7 min read

Updated Jun 8, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 8, 2026

Fact-checked

In review queue

Sources

3 citations

Revision

v1 · 1,425 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Overview

RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval) is a retrieval method for retrieval-augmented generation introduced in a 2024 paper by researchers at Stanford University. Instead of retrieving only short, contiguous passages from a corpus, RAPTOR recursively clusters and summarizes text chunks with a large language model to build a multi-level tree. The tree spans fine-grained leaf chunks at the bottom and progressively more abstract summary nodes toward the top, so that retrieval can draw on both local detail and a holistic view of a long document. RAPTOR was presented at the International Conference on Learning Representations (ICLR) 2024 and, when paired with GPT-4, set state-of-the-art results on several long-document and multi-step question-answering benchmarks ^[1].

The method was published as "RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval" by Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Manning ^[1]. An official open-source implementation accompanies the paper, and the technique has been integrated into popular RAG frameworks including LlamaIndex and LangChain ^[2]^[3].

Background: limitations of chunked RAG

Standard retrieval-augmented generation splits a corpus into short, fixed-size chunks (for example, a few hundred tokens each), computes embeddings for each chunk, and at query time returns the top-k chunks most similar to the query embedding. This contiguous-chunk approach works well for questions whose answer sits in a single localized passage, such as a specific fact or definition ^[1].

The approach struggles, however, with questions that require integrating information spread across a long document or reasoning over its overall structure. Because retrieval is limited to isolated short segments, the model rarely sees the broader thematic context. The RAPTOR paper argues that this limits holistic understanding of overall document context, which is precisely what multi-step reasoning and narrative-comprehension questions demand ^[1]. Naively enlarging the chunk size dilutes retrieval precision and can exceed model context limits, so simply reading more contiguous text is not a clean fix. RAPTOR addresses the gap by adding abstraction: it precomputes summaries at multiple levels of granularity so that, at query time, the retriever can choose the resolution that best matches the question.

How RAPTOR works

RAPTOR builds its index in a bottom-up, recursive process and then queries it with one of two retrieval strategies.

Chunking and embedding

The source text is first segmented into short contiguous chunks (the paper uses roughly 100 tokens per chunk, keeping sentences intact rather than cutting them mid-sentence). Each chunk is encoded into a dense vector using a sentence-embedding model. The paper uses SBERT, specifically the multi-qa-mpnet-base-cos-v1 model ^[1]. These embedded chunks form the leaf nodes (level 0) of the tree.

Recursive clustering

RAPTOR groups similar chunks using soft clustering, which allows a single node to belong to more than one cluster, reflecting that a passage may be relevant to several topics. Clustering uses Gaussian mixture models (GMMs). Because embedding vectors are high-dimensional and distance metrics degrade in high dimensions, RAPTOR first reduces dimensionality with UMAP (Uniform Manifold Approximation and Projection). By varying the UMAP n_neighbors parameter, the method captures structure at multiple scales, from broad global clusters to tighter local ones. The number of clusters is selected automatically using the Bayesian Information Criterion (BIC), with the GMM parameters then fit by expectation-maximization ^[1].

Summarization and tree building

Each cluster of chunks is concatenated and passed to a large language model, which produces a single summary capturing the cluster's content. The paper uses gpt-3.5-turbo for summarization ^[1]. These summaries become new nodes at the next level up. RAPTOR then re-embeds the summaries and repeats the cluster-and-summarize step on them, producing still higher-level summaries. The recursion continues until further clustering is no longer possible (for example, the remaining nodes fit in a single cluster), yielding a tree whose root nodes summarize large swaths of the corpus and whose leaves are the original chunks. This construction grows roughly linearly in cost with the size of the corpus rather than quadratically ^[1].

Retrieval

At query time, the query is embedded and compared against tree nodes by cosine similarity. RAPTOR defines two strategies:

Tree traversal. Retrieval starts at the root layer, selects the top-k most similar nodes, then descends into the children of those nodes, repeating layer by layer down to the leaves. This walks the hierarchy and preserves a roughly fixed ratio of high-level to granular context.
Collapsed tree. The tree is flattened so that leaf chunks and summary nodes from every level sit together in one pool. A single cosine-similarity search runs across all nodes simultaneously, and the highest-scoring nodes are added until a token budget is reached (the paper notes a 2,000-token budget, roughly the top 20 nodes).

In the paper's experiments the collapsed-tree strategy outperformed tree traversal, because it lets the retriever pick whatever mix of detailed and abstract nodes best fits each individual question instead of committing to a fixed traversal pattern ^[1]. The retrieved nodes are concatenated into the prompt of a reader model (the question-answering LLM) to generate the final answer.

Results

RAPTOR was evaluated on three long-document question-answering benchmarks: QuALITY (multiple-choice questions over long passages), NarrativeQA (questions over full stories and scripts), and QASPER (questions over NLP research papers). Across datasets and reader models, RAPTOR retrieval consistently outperformed contiguous-chunk baselines using DPR (Dense Passage Retrieval) and BM25 ^[1].

The most prominent result is on QuALITY paired with GPT-4: RAPTOR improved the best previously reported absolute accuracy on the benchmark by 20 percentage points ^[1]. Selected reported figures from the paper:

Benchmark	Reader model	Metric	RAPTOR	Notes
QuALITY	GPT-4	Accuracy	82.6%	vs. 62.3% prior best; about a 20-point absolute gain ^[1]
QuALITY (hard subset)	GPT-4	Accuracy	76.2%	vs. 54.7% prior best ^[1]
QASPER	GPT-4	F1 Match	55.7%	vs. DPR 53.0% and BM25 50.2% ^[1]
NarrativeQA	UnifiedQA-3B	ROUGE-L	30.8%	with BLEU-1 23.5, BLEU-4 6.4, METEOR 19.1 ^[1]

The authors also report that the gains held across reader models of different sizes, and that the collapsed-tree retrieval strategy was the stronger of the two retrieval modes ^[1].

Relationship to other retrieval methods

RAPTOR sits between simple chunked RAG and explicitly structured retrieval methods:

Chunked RAG. RAPTOR is a strict superset in terms of index content: a collapsed-tree index still contains the original leaf chunks, so it can answer local-fact questions like standard RAG, while additionally offering summary nodes for questions needing broader context. The extra cost is the one-time clustering and LLM summarization during indexing ^[1].
Hierarchical summarization. RAPTOR can be seen as turning hierarchical, recursive summarization of a document into a retrievable index rather than producing a single final summary. The intermediate summary nodes are retained and individually searchable.
GraphRAG. GraphRAG, popularized by Microsoft in 2024, also builds a multi-level structure over a corpus, but it extracts an explicit knowledge graph of entities and relationships and then summarizes graph communities. RAPTOR instead clusters raw text embeddings, so it builds a tree rather than a graph and does not require entity or relation extraction. Both aim to support global, "sense-making" queries that span an entire corpus, and both rely on LLM-generated summaries of clusters or communities.

Because its tree is built purely from embeddings and text summaries, RAPTOR is largely agnostic to the choice of embedding model, summarizer, and reader, and can be layered on top of an existing vector store.

Adoption

An official implementation of RAPTOR was released as open source by the authors, which contributed to its uptake in the RAG community ^[2]. The method is available as a packaged component in mainstream frameworks: LlamaIndex ships a "Raptor" pack and retriever whose run() mode can be set to either collapsed or tree_traversal ^[2], and community implementations using LangChain build the leaf chunks and per-level summaries into a vector store (for example, FAISS) for collapsed-tree retrieval ^[3]. RAPTOR is frequently cited and reimplemented in tutorials and follow-up coursework as a representative example of hierarchical or "multi-resolution" retrieval, and it has become a common baseline in research comparing advanced RAG architectures ^[3].

References

Sarthi, P., Abdullah, S., Tuli, A., Khanna, S., Goldie, A., and Manning, C. D. "RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval." ICLR 2024. arXiv:2401.18059. https://arxiv.org/abs/2401.18059 ↩
"Raptor Retriever LlamaPack" and "Raptor," LlamaIndex documentation and LlamaHub. https://developers.llamaindex.ai/python/framework-api-reference/packs/raptor/ ↩
"Mastering RAG with RAPTOR: A comprehensive guide using LlamaIndex," Educative; LangChain RAPTOR cookbook and community implementations. https://www.educative.io/blog/mastering-rag-with-raptor ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

AI agents

Overview

Background: limitations of chunked RAG

How RAPTOR works

Chunking and embedding

Recursive clustering

Summarization and tree building

Retrieval

Results

Relationship to other retrieval methods

Adoption

References

Improve this article

Related Articles

Agentic Context Engineering

Computer-use agent

AI agents

Mixture of Agents

Reflexion

Coconut (Chain of Continuous Thought)

What links here

Related Articles

Agentic Context Engineering

Computer-use agent

AI agents

Mixture of Agents

Reflexion

Coconut (Chain of Continuous Thought)