RAPTOR
Last reviewed
Jun 8, 2026
Sources
3 citations
Review status
Source-backed
Revision
v1 · 1,425 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 8, 2026
Sources
3 citations
Review status
Source-backed
Revision
v1 · 1,425 words
Add missing citations, update stale details, or suggest a clearer explanation.
RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval) is a retrieval method for retrieval-augmented generation introduced in a 2024 paper by researchers at Stanford University. Instead of retrieving only short, contiguous passages from a corpus, RAPTOR recursively clusters and summarizes text chunks with a large language model to build a multi-level tree. The tree spans fine-grained leaf chunks at the bottom and progressively more abstract summary nodes toward the top, so that retrieval can draw on both local detail and a holistic view of a long document. RAPTOR was presented at the International Conference on Learning Representations (ICLR) 2024 and, when paired with GPT-4, set state-of-the-art results on several long-document and multi-step question-answering benchmarks [1].
The method was published as "RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval" by Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Manning [1]. An official open-source implementation accompanies the paper, and the technique has been integrated into popular RAG frameworks including LlamaIndex and LangChain [2][3].
Standard retrieval-augmented generation splits a corpus into short, fixed-size chunks (for example, a few hundred tokens each), computes embeddings for each chunk, and at query time returns the top-k chunks most similar to the query embedding. This contiguous-chunk approach works well for questions whose answer sits in a single localized passage, such as a specific fact or definition [1].
The approach struggles, however, with questions that require integrating information spread across a long document or reasoning over its overall structure. Because retrieval is limited to isolated short segments, the model rarely sees the broader thematic context. The RAPTOR paper argues that this limits holistic understanding of overall document context, which is precisely what multi-step reasoning and narrative-comprehension questions demand [1]. Naively enlarging the chunk size dilutes retrieval precision and can exceed model context limits, so simply reading more contiguous text is not a clean fix. RAPTOR addresses the gap by adding abstraction: it precomputes summaries at multiple levels of granularity so that, at query time, the retriever can choose the resolution that best matches the question.
RAPTOR builds its index in a bottom-up, recursive process and then queries it with one of two retrieval strategies.
The source text is first segmented into short contiguous chunks (the paper uses roughly 100 tokens per chunk, keeping sentences intact rather than cutting them mid-sentence). Each chunk is encoded into a dense vector using a sentence-embedding model. The paper uses SBERT, specifically the multi-qa-mpnet-base-cos-v1 model [1]. These embedded chunks form the leaf nodes (level 0) of the tree.
RAPTOR groups similar chunks using soft clustering, which allows a single node to belong to more than one cluster, reflecting that a passage may be relevant to several topics. Clustering uses Gaussian mixture models (GMMs). Because embedding vectors are high-dimensional and distance metrics degrade in high dimensions, RAPTOR first reduces dimensionality with UMAP (Uniform Manifold Approximation and Projection). By varying the UMAP n_neighbors parameter, the method captures structure at multiple scales, from broad global clusters to tighter local ones. The number of clusters is selected automatically using the Bayesian Information Criterion (BIC), with the GMM parameters then fit by expectation-maximization [1].
Each cluster of chunks is concatenated and passed to a large language model, which produces a single summary capturing the cluster's content. The paper uses gpt-3.5-turbo for summarization [1]. These summaries become new nodes at the next level up. RAPTOR then re-embeds the summaries and repeats the cluster-and-summarize step on them, producing still higher-level summaries. The recursion continues until further clustering is no longer possible (for example, the remaining nodes fit in a single cluster), yielding a tree whose root nodes summarize large swaths of the corpus and whose leaves are the original chunks. This construction grows roughly linearly in cost with the size of the corpus rather than quadratically [1].
At query time, the query is embedded and compared against tree nodes by cosine similarity. RAPTOR defines two strategies:
In the paper's experiments the collapsed-tree strategy outperformed tree traversal, because it lets the retriever pick whatever mix of detailed and abstract nodes best fits each individual question instead of committing to a fixed traversal pattern [1]. The retrieved nodes are concatenated into the prompt of a reader model (the question-answering LLM) to generate the final answer.
RAPTOR was evaluated on three long-document question-answering benchmarks: QuALITY (multiple-choice questions over long passages), NarrativeQA (questions over full stories and scripts), and QASPER (questions over NLP research papers). Across datasets and reader models, RAPTOR retrieval consistently outperformed contiguous-chunk baselines using DPR (Dense Passage Retrieval) and BM25 [1].
The most prominent result is on QuALITY paired with GPT-4: RAPTOR improved the best previously reported absolute accuracy on the benchmark by 20 percentage points [1]. Selected reported figures from the paper:
| Benchmark | Reader model | Metric | RAPTOR | Notes |
|---|---|---|---|---|
| QuALITY | GPT-4 | Accuracy | 82.6% | vs. 62.3% prior best; about a 20-point absolute gain [1] |
| QuALITY (hard subset) | GPT-4 | Accuracy | 76.2% | vs. 54.7% prior best [1] |
| QASPER | GPT-4 | F1 Match | 55.7% | vs. DPR 53.0% and BM25 50.2% [1] |
| NarrativeQA | UnifiedQA-3B | ROUGE-L | 30.8% | with BLEU-1 23.5, BLEU-4 6.4, METEOR 19.1 [1] |
The authors also report that the gains held across reader models of different sizes, and that the collapsed-tree retrieval strategy was the stronger of the two retrieval modes [1].
RAPTOR sits between simple chunked RAG and explicitly structured retrieval methods:
Because its tree is built purely from embeddings and text summaries, RAPTOR is largely agnostic to the choice of embedding model, summarizer, and reader, and can be layered on top of an existing vector store.
An official implementation of RAPTOR was released as open source by the authors, which contributed to its uptake in the RAG community [2]. The method is available as a packaged component in mainstream frameworks: LlamaIndex ships a "Raptor" pack and retriever whose run() mode can be set to either collapsed or tree_traversal [2], and community implementations using LangChain build the leaf chunks and per-level summaries into a vector store (for example, FAISS) for collapsed-tree retrieval [3]. RAPTOR is frequently cited and reimplemented in tutorials and follow-up coursework as a representative example of hierarchical or "multi-resolution" retrieval, and it has become a common baseline in research comparing advanced RAG architectures [3].