ChatGPT Retrieval Plugin
Last reviewed
May 10, 2026
Sources
10 citations
Review status
Source-backed
Revision
v3 ยท 2,329 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 10, 2026
Sources
10 citations
Review status
Source-backed
Revision
v3 ยท 2,329 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: ChatGPT and ChatGPT Plugins GitHub
The ChatGPT Retrieval Plugin is an open-source reference implementation released by OpenAI on March 23, 2023, alongside the launch of the broader ChatGPT plugins ecosystem [1][2]. It is a self-hosted FastAPI backend that lets ChatGPT perform semantic search and retrieval over personal or organizational documents stored in a vector database. At a high level the plugin embeds incoming text with an OpenAI embeddings model, writes the vectors into the user's chosen datastore, and exposes a small REST API that ChatGPT or any other LLM client can call to fetch relevant passages on demand.
The project served two purposes. First, it gave ChatGPT a way around the original chatbot's lack of long-term memory by letting users plug in their own indexed corpus. Second, it acted as a public reference implementation of retrieval-augmented generation (RAG) at a time when the technique was moving from research papers into mainstream developer tooling. The OpenAI plugin model itself was wound down by April 2024, but the repository remained on GitHub as a backend for Custom GPTs, GPT Actions, and the Assistants API [3][4].
On March 23, 2023, OpenAI announced ChatGPT plugins as a way to extend the chatbot beyond its training data, comparing the model to a system that could read documentation, query APIs, and call tools rather than guess from memory alone [1]. The launch shipped with a small set of first-party plugins (web browsing and a Python code interpreter) and around eleven third-party partners including Expedia, FiscalNote, Instacart, Kayak, Klarna, Milo, OpenTable, Shopify, Slack, Speak, Wolfram, and Zapier [1][2].
The Retrieval Plugin sat in a separate category. Instead of being a single hosted plugin, OpenAI published its source code on GitHub at openai/chatgpt-retrieval-plugin so that any developer could clone the repository, point it at a vector database of their choice, and host their own instance. This was the only open-source reference plugin in the initial launch, which made it the canonical way to learn how the plugin manifest, authentication, and OpenAPI schema were supposed to fit together. Pinecone contributed the original datastore implementation, and the canonical "upsert / query / delete" naming for the API endpoints came directly from Pinecone's vector database conventions [5].
Early access was gated through a waitlist, with rollout prioritized to ChatGPT Plus subscribers and a limited pool of developers before the broader plugin store opened in mid-2023 [1].
The Retrieval Plugin is structured as a thin FastAPI server that wraps three pluggable layers.
| Layer | Role | Notes |
|---|---|---|
| API server | FastAPI app exposing REST endpoints | Auto-generates Swagger UI at /docs for testing |
| Embeddings | Converts text chunks into vectors | OpenAI embeddings, model selectable |
| Datastore | Stores and retrieves vectors with metadata | One of 15+ supported vector databases |
| Plugin manifest | Files under .well-known/ describing the plugin to ChatGPT | ai-plugin.json plus openapi.yaml and logo.png |
When a document is uploaded the server splits it into chunks (default 200 tokens with overlap), calls OpenAI to embed each chunk, attaches metadata such as source URL, author, and timestamp, and writes the result into the datastore. When ChatGPT or another client issues a natural-language query, the server embeds the query, runs an approximate nearest-neighbor search against the datastore, optionally filters by metadata, and returns the top matching chunks back to the model as context.
The HTTP surface is intentionally small, which makes the plugin easy to inspect, audit, and reimplement. The OpenAPI schema exposed to ChatGPT only includes the /query endpoint by default; the /upsert and /delete endpoints are reserved for the host application so the model cannot mutate the datastore on its own without the developer's consent [6].
| Endpoint | Method | Purpose |
|---|---|---|
| /upsert | POST | Insert or update a list of documents with metadata |
| /upsert-file | POST | Upload a single file (PDF, TXT, DOCX, PPTX, MD) for parsing and chunking |
| /query | POST | Run one or more semantic queries with optional metadata filters and top-k |
| /delete | DELETE | Remove documents by ID, metadata filter, or wipe all |
At launch the default embedding model was text-embedding-ada-002, the model OpenAI was recommending across the board in early 2023. After OpenAI released its third-generation embeddings in January 2024, the repository was updated so the default became text-embedding-3-large at 256 dimensions, with text-embedding-3-small and the legacy ada-002 still selectable through environment variables [6]. The dimension parameter on the v3 models lets developers trade accuracy for storage cost, which matters at scale because every chunk in the corpus is stored as a float vector.
The plugin manifest supports four authentication options, matching the wider ChatGPT plugin spec.
| Mode | Use case |
|---|---|
| No auth | Local development and demos |
| Service-level HTTP bearer | A shared token between the developer and ChatGPT |
| User-level HTTP bearer | Each end user pastes their own token at install time |
| OAuth | The plugin acts as an OAuth client against the developer's identity provider |
For production use, OpenAI's documentation pushed developers toward bearer tokens or OAuth; running the plugin without any authentication was treated as a localhost-only configuration.
One of the design goals was that any team should be able to deploy the plugin without changing their existing data infrastructure. By the time the repository stabilized, fifteen vector backends were supported [6]. The choice usually comes down to whether the team wants a managed service, an open-source self-hosted engine, or to keep their vectors next to their existing relational data.
| Provider | Hosted/Self-hosted | Notes |
|---|---|---|
| Pinecone | Managed | Original reference implementation, hybrid search with SPLADE sparse vectors |
| Weaviate | Both | Open-source, hybrid keyword plus vector search out of the box |
| Milvus | Both | Open-source CNCF graduate project, scales to billions of vectors |
| Zilliz | Managed | Commercial cloud version of Milvus |
| Qdrant | Both | Rust-based engine, payload filtering, Qdrant Cloud option |
| Chroma | Both | Lightweight Python-native datastore popular with prototypes |
| Redis | Both | Vector search via Redis Stack and Redis Cloud |
| PostgreSQL (pgvector) | Both | Vectors live next to relational data, useful for teams already on Postgres |
| Elasticsearch | Both | Adds dense vector retrieval on top of existing keyword indexes |
| MongoDB Atlas | Managed | Atlas Vector Search alongside document data |
| Supabase | Managed | Postgres plus pgvector with row-level security |
| Azure Cognitive Search | Managed | Microsoft's enterprise search service, vector mode |
| Azure CosmosDB Mongo vCore | Managed | Vector search inside a Mongo-compatible Azure store |
| AnalyticDB | Managed | Alibaba Cloud's analytical database with vector support |
| LlamaIndex | Self-hosted | Acts as an in-memory index layer rather than a database |
Each backend ships with its own Dockerfile and a setup guide under docs/providers//setup.md, and the repository's CI runs the same integration test suite against every provider so that the API surface stays consistent.
When a user installed the Retrieval Plugin in the original plugin store, ChatGPT would read the plugin manifest, which told the model what the plugin was for and how to call it. During a conversation, the model would decide on its own whether to invoke /query, send the user's question (or a rewritten version of it) as the query string, and then incorporate the returned snippets into its answer. Because only /query was exposed by default, ChatGPT could read from the corpus but could not write to it without the developer explicitly opening up /upsert.
Developers who wanted ChatGPT to remember things across sessions could turn the memory feature on, which made /upsert available to the model so it could push fresh snippets from the conversation into the vector database. The next time the user came back, the same ChatGPT instance could pull those snippets into context and act as if it remembered. This was a primitive form of long-term memory two years before OpenAI shipped its native ChatGPT memory feature in 2024.
The plugin became the default starting point for a wave of internal-search and personal-knowledge-base prototypes during 2023.
| Use case | Description |
|---|---|
| Internal company search | Index Confluence, Notion, Slack exports, and Google Drive into a private vector store; ChatGPT answers HR, IT, and product questions with citations to the source |
| Personal knowledge base | Drop notes, emails, and PDFs into the corpus so ChatGPT can recall details about your projects, contacts, and travel |
| Customer support | Embed product manuals and past tickets so the model can answer support questions grounded in the company's own documentation |
| Long-term chat memory | Persist past conversations and recall them in later sessions, which the original ChatGPT lacked |
| Domain copilots | Build vertical assistants for legal, medical, or research workflows that need to ground answers in a specific corpus |
Grounding answers in retrieved snippets also made it easier to show citations back to the user, which was one of the few practical answers to the LLM hallucination problem that worked without retraining the model.
The wider ChatGPT plugin model had a short life. OpenAI announced Custom GPTs and the GPT Store at DevDay in November 2023 and positioned them as the successor to plugins, with each GPT able to call developer-defined Actions through an OpenAPI schema. In early 2024 OpenAI told plugin developers that the plugin store would close: new plugin conversations stopped on March 19, 2024, and existing plugin conversations were fully turned off on April 9, 2024 [3][4].
The Retrieval Plugin survived the transition because it was always just a self-hosted REST API. The same FastAPI server that backed a plugin in 2023 worked as a GPT Action in 2024, and as a function-calling endpoint behind the Chat Completions API and Assistants API. The repository's README was updated to reflect this: ChatGPT plugins were marked deprecated, and developers were redirected to use the plugin as a backend for Custom GPTs or to call its endpoints from their own agent frameworks [6].
| Date | Event |
|---|---|
| March 23, 2023 | ChatGPT plugins announced; Retrieval Plugin open-sourced [1] |
| November 6, 2023 | OpenAI DevDay introduces Custom GPTs and the Assistants API [7] |
| January 25, 2024 | text-embedding-3 family released; default embedding model later updated [8] |
| March 19, 2024 | New conversations with ChatGPT plugins disabled [3] |
| April 9, 2024 | Existing plugin conversations turned off; plugin store closed [3] |
By 2025 active development on the repository had slowed considerably, with community members noting on the OpenAI developer forum that the project looked dormant. The arrival of the Model Context Protocol (MCP) later that year offered a different, less ad-hoc standard for connecting LLMs to external tools and data, and many teams moved their RAG backends to MCP servers or to the native file search built into the Assistants API.
Measured by code shipped, the Retrieval Plugin was a small repository. Measured by influence, it was one of the most-cloned reference designs in the early RAG era.
First, it standardized the upsert/query/delete vocabulary across vector databases. The plugin's API shape was already familiar to Pinecone users, but seeing OpenAI codify it as the default plugin contract pushed every other vector database vendor to expose a compatible surface.
Second, it normalized the architecture that almost every later RAG framework copied: a chunking step, an embedding call, a vector-database write, then a retrieve-and-stuff loop at query time. Tools like LangChain, LlamaIndex, and Haystack already had similar pipelines, but the OpenAI repository made it easy for a developer who had never thought about embeddings to spin up a working stack in an afternoon.
Third, it pushed vector databases into the mainstream. The plugin's launch coincided with rapid funding rounds and feature releases at Pinecone, Weaviate, Qdrant, Chroma, and Milvus, and being on the official Retrieval Plugin support list became a de facto endorsement.
The shape of modern RAG, embed your corpus, store the vectors with metadata, retrieve top-k chunks at query time, and let the model write a grounded answer with citations, was not invented by this plugin. The retrieval-augmented generation paper by Lewis et al. (2020) and earlier dense-retrieval work predate it by several years. What the Retrieval Plugin did was package that workflow as something a hobbyist could deploy on a free Render or Fly.io tier, and that accessibility was probably more important to the field than any single feature in the code itself.