ChatGPT Retrieval Plugin

ChatGPT

12 min read

Updated May 10, 2026

Suggest edit History Talk

RawGraph

Last edited

May 10, 2026

Fact-checked

In review queue

Sources

10 citations

Revision

v3 · 2,329 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

See also: ChatGPT and ChatGPT Plugins GitHub

Introduction

The ChatGPT Retrieval Plugin is an open-source reference implementation released by OpenAI on March 23, 2023, alongside the launch of the broader ChatGPT plugins ecosystem ^[1]^[2]. It is a self-hosted FastAPI backend that lets ChatGPT perform semantic search and retrieval over personal or organizational documents stored in a vector database. At a high level the plugin embeds incoming text with an OpenAI embeddings model, writes the vectors into the user's chosen datastore, and exposes a small REST API that ChatGPT or any other LLM client can call to fetch relevant passages on demand.

The project served two purposes. First, it gave ChatGPT a way around the original chatbot's lack of long-term memory by letting users plug in their own indexed corpus. Second, it acted as a public reference implementation of retrieval-augmented generation (RAG) at a time when the technique was moving from research papers into mainstream developer tooling. The OpenAI plugin model itself was wound down by April 2024, but the repository remained on GitHub as a backend for Custom GPTs, GPT Actions, and the Assistants API ^[3]^[4].

Background and release

On March 23, 2023, OpenAI announced ChatGPT plugins as a way to extend the chatbot beyond its training data, comparing the model to a system that could read documentation, query APIs, and call tools rather than guess from memory alone ^[1]. The launch shipped with a small set of first-party plugins (web browsing and a Python code interpreter) and around eleven third-party partners including Expedia, FiscalNote, Instacart, Kayak, Klarna, Milo, OpenTable, Shopify, Slack, Speak, Wolfram, and Zapier ^[1]^[2].

The Retrieval Plugin sat in a separate category. Instead of being a single hosted plugin, OpenAI published its source code on GitHub at openai/chatgpt-retrieval-plugin so that any developer could clone the repository, point it at a vector database of their choice, and host their own instance. This was the only open-source reference plugin in the initial launch, which made it the canonical way to learn how the plugin manifest, authentication, and OpenAPI schema were supposed to fit together. Pinecone contributed the original datastore implementation, and the canonical "upsert / query / delete" naming for the API endpoints came directly from Pinecone's vector database conventions ^[5].

Early access was gated through a waitlist, with rollout prioritized to ChatGPT Plus subscribers and a limited pool of developers before the broader plugin store opened in mid-2023 ^[1].

Architecture

The Retrieval Plugin is structured as a thin FastAPI server that wraps three pluggable layers.

Layer	Role	Notes
API server	FastAPI app exposing REST endpoints	Auto-generates Swagger UI at /docs for testing
Embeddings	Converts text chunks into vectors	OpenAI embeddings, model selectable
Datastore	Stores and retrieves vectors with metadata	One of 15+ supported vector databases
Plugin manifest	Files under .well-known/ describing the plugin to ChatGPT	ai-plugin.json plus openapi.yaml and logo.png

When a document is uploaded the server splits it into chunks (default 200 tokens with overlap), calls OpenAI to embed each chunk, attaches metadata such as source URL, author, and timestamp, and writes the result into the datastore. When ChatGPT or another client issues a natural-language query, the server embeds the query, runs an approximate nearest-neighbor search against the datastore, optionally filters by metadata, and returns the top matching chunks back to the model as context.

API endpoints

The HTTP surface is intentionally small, which makes the plugin easy to inspect, audit, and reimplement. The OpenAPI schema exposed to ChatGPT only includes the /query endpoint by default; the /upsert and /delete endpoints are reserved for the host application so the model cannot mutate the datastore on its own without the developer's consent ^[6].

Endpoint	Method	Purpose
/upsert	POST	Insert or update a list of documents with metadata
/upsert-file	POST	Upload a single file (PDF, TXT, DOCX, PPTX, MD) for parsing and chunking
/query	POST	Run one or more semantic queries with optional metadata filters and top-k
/delete	DELETE	Remove documents by ID, metadata filter, or wipe all

Embedding models

At launch the default embedding model was text-embedding-ada-002, the model OpenAI was recommending across the board in early 2023. After OpenAI released its third-generation embeddings in January 2024, the repository was updated so the default became text-embedding-3-large at 256 dimensions, with text-embedding-3-small and the legacy ada-002 still selectable through environment variables ^[6]. The dimension parameter on the v3 models lets developers trade accuracy for storage cost, which matters at scale because every chunk in the corpus is stored as a float vector.

Authentication

The plugin manifest supports four authentication options, matching the wider ChatGPT plugin spec.

Mode	Use case
No auth	Local development and demos
Service-level HTTP bearer	A shared token between the developer and ChatGPT
User-level HTTP bearer	Each end user pastes their own token at install time
OAuth	The plugin acts as an OAuth client against the developer's identity provider

For production use, OpenAI's documentation pushed developers toward bearer tokens or OAuth; running the plugin without any authentication was treated as a localhost-only configuration.

Supported vector databases

One of the design goals was that any team should be able to deploy the plugin without changing their existing data infrastructure. By the time the repository stabilized, fifteen vector backends were supported ^[6]. The choice usually comes down to whether the team wants a managed service, an open-source self-hosted engine, or to keep their vectors next to their existing relational data.

Provider	Hosted/Self-hosted	Notes
Pinecone	Managed	Original reference implementation, hybrid search with SPLADE sparse vectors
Weaviate	Both	Open-source, hybrid keyword plus vector search out of the box
Milvus	Both	Open-source CNCF graduate project, scales to billions of vectors
Zilliz	Managed	Commercial cloud version of Milvus
Qdrant	Both	Rust-based engine, payload filtering, Qdrant Cloud option
Chroma	Both	Lightweight Python-native datastore popular with prototypes
Redis	Both	Vector search via Redis Stack and Redis Cloud
PostgreSQL (pgvector)	Both	Vectors live next to relational data, useful for teams already on Postgres
Elasticsearch	Both	Adds dense vector retrieval on top of existing keyword indexes
MongoDB Atlas	Managed	Atlas Vector Search alongside document data
Supabase	Managed	Postgres plus pgvector with row-level security
Azure Cognitive Search	Managed	Microsoft's enterprise search service, vector mode
Azure CosmosDB Mongo vCore	Managed	Vector search inside a Mongo-compatible Azure store
AnalyticDB	Managed	Alibaba Cloud's analytical database with vector support
LlamaIndex	Self-hosted	Acts as an in-memory index layer rather than a database

Each backend ships with its own Dockerfile and a setup guide under docs/providers//setup.md, and the repository's CI runs the same integration test suite against every provider so that the API surface stays consistent.

How the plugin fits into ChatGPT

When a user installed the Retrieval Plugin in the original plugin store, ChatGPT would read the plugin manifest, which told the model what the plugin was for and how to call it. During a conversation, the model would decide on its own whether to invoke /query, send the user's question (or a rewritten version of it) as the query string, and then incorporate the returned snippets into its answer. Because only /query was exposed by default, ChatGPT could read from the corpus but could not write to it without the developer explicitly opening up /upsert.

Developers who wanted ChatGPT to remember things across sessions could turn the memory feature on, which made /upsert available to the model so it could push fresh snippets from the conversation into the vector database. The next time the user came back, the same ChatGPT instance could pull those snippets into context and act as if it remembered. This was a primitive form of long-term memory two years before OpenAI shipped its native ChatGPT memory feature in 2024.

Use cases

The plugin became the default starting point for a wave of internal-search and personal-knowledge-base prototypes during 2023.

Use case	Description
Internal company search	Index Confluence, Notion, Slack exports, and Google Drive into a private vector store; ChatGPT answers HR, IT, and product questions with citations to the source
Personal knowledge base	Drop notes, emails, and PDFs into the corpus so ChatGPT can recall details about your projects, contacts, and travel
Customer support	Embed product manuals and past tickets so the model can answer support questions grounded in the company's own documentation
Long-term chat memory	Persist past conversations and recall them in later sessions, which the original ChatGPT lacked
Domain copilots	Build vertical assistants for legal, medical, or research workflows that need to ground answers in a specific corpus

Grounding answers in retrieved snippets also made it easier to show citations back to the user, which was one of the few practical answers to the LLM hallucination problem that worked without retraining the model.

Deprecation and the move to GPT Actions

The wider ChatGPT plugin model had a short life. OpenAI announced Custom GPTs and the GPT Store at DevDay in November 2023 and positioned them as the successor to plugins, with each GPT able to call developer-defined Actions through an OpenAPI schema. In early 2024 OpenAI told plugin developers that the plugin store would close: new plugin conversations stopped on March 19, 2024, and existing plugin conversations were fully turned off on April 9, 2024 ^[3]^[4].

The Retrieval Plugin survived the transition because it was always just a self-hosted REST API. The same FastAPI server that backed a plugin in 2023 worked as a GPT Action in 2024, and as a function-calling endpoint behind the Chat Completions API and Assistants API. The repository's README was updated to reflect this: ChatGPT plugins were marked deprecated, and developers were redirected to use the plugin as a backend for Custom GPTs or to call its endpoints from their own agent frameworks ^[6].

Date	Event
March 23, 2023	ChatGPT plugins announced; Retrieval Plugin open-sourced ^[1]
November 6, 2023	OpenAI DevDay introduces Custom GPTs and the Assistants API ^[7]
January 25, 2024	text-embedding-3 family released; default embedding model later updated ^[8]
March 19, 2024	New conversations with ChatGPT plugins disabled ^[3]
April 9, 2024	Existing plugin conversations turned off; plugin store closed ^[3]

By 2025 active development on the repository had slowed considerably, with community members noting on the OpenAI developer forum that the project looked dormant. The arrival of the Model Context Protocol (MCP) later that year offered a different, less ad-hoc standard for connecting LLMs to external tools and data, and many teams moved their RAG backends to MCP servers or to the native file search built into the Assistants API.

Influence on the RAG ecosystem

Measured by code shipped, the Retrieval Plugin was a small repository. Measured by influence, it was one of the most-cloned reference designs in the early RAG era.

First, it standardized the upsert/query/delete vocabulary across vector databases. The plugin's API shape was already familiar to Pinecone users, but seeing OpenAI codify it as the default plugin contract pushed every other vector database vendor to expose a compatible surface.

Second, it normalized the architecture that almost every later RAG framework copied: a chunking step, an embedding call, a vector-database write, then a retrieve-and-stuff loop at query time. Tools like LangChain, LlamaIndex, and Haystack already had similar pipelines, but the OpenAI repository made it easy for a developer who had never thought about embeddings to spin up a working stack in an afternoon.

Third, it pushed vector databases into the mainstream. The plugin's launch coincided with rapid funding rounds and feature releases at Pinecone, Weaviate, Qdrant, Chroma, and Milvus, and being on the official Retrieval Plugin support list became a de facto endorsement.

The shape of modern RAG, embed your corpus, store the vectors with metadata, retrieve top-k chunks at query time, and let the model write a grounded answer with citations, was not invented by this plugin. The retrieval-augmented generation paper by Lewis et al. (2020) and earlier dense-retrieval work predate it by several years. What the Retrieval Plugin did was package that workflow as something a hobbyist could deploy on a free Render or Fly.io tier, and that accessibility was probably more important to the field than any single feature in the code itself.

References

OpenAI. "ChatGPT plugins." March 23, 2023. https://openai.com/index/chatgpt-plugins/ ↩
Wiggers, Kyle. "OpenAI connects ChatGPT to the internet." TechCrunch, March 23, 2023. https://techcrunch.com/2023/03/23/openai-connects-chatgpt-to-the-internet/ ↩
OpenAI. "Winding down the ChatGPT plugins beta." Help Center article, 2024. ↩
OpenAI Developer Community. "Error: Plugins are no longer supported." April 2024. https://community.openai.com/t/error-plugins-are-no-longer-supported/715523 ↩
Mayer, Anthony Alford and Daniel Bryant interviewing Roy Miara. "OpenAI's Open-Source ChatGPT Plugin." InfoQ, May 2023. https://www.infoq.com/news/2023/05/chatgpt-retrieval-plugin/ ↩
OpenAI. "chatgpt-retrieval-plugin README." GitHub repository. https://github.com/openai/chatgpt-retrieval-plugin ↩
OpenAI. "New models and developer products announced at DevDay." November 6, 2023. https://openai.com/index/new-models-and-developer-products-announced-at-devday/ ↩
OpenAI. "New embedding models and API updates." January 25, 2024. https://openai.com/index/new-embedding-models-and-api-updates/ ↩
Weaviate. "The ChatGPT Retrieval Plugin: Weaviate as a Long-term Memory Store for Generative AI." April 2023. https://weaviate.io/blog/weaviate-retrieval-plugin
OpenAI. "Retrieval Augmented Generation (RAG) and Semantic Search for GPTs." Help Center article. https://help.openai.com/en/articles/8868588-retrieval-augmented-generation-rag-and-semantic-search-for-gpts

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

ChatGPT Guides List of ChatGPT Plugins

Introduction

Background and release

Architecture

API endpoints

Embedding models

Authentication

Supported vector databases

How the plugin fits into ChatGPT

Use cases

Deprecation and the move to GPT Actions

Influence on the RAG ecosystem

See also

References

Improve this article

Related Articles

22.500 Best Custom GPTs

Charity

Charity ChatGPT Plugins

ChatGPT

ChatGPT Classic

ChatGPT Guides

What links here

Related Articles

22.500 Best Custom GPTs

Charity

Charity ChatGPT Plugins

ChatGPT

ChatGPT Classic

ChatGPT Guides

What links here