ChatGPT Plugins

Revision as of 12:11, 7 April 2023 by Daikon Radish (talk | contribs)

Introduction

Language models, such as ChatGPT, have shown their utility in a range of applications, but they remain limited by their reliance on training data. To enhance their capabilities, plugins can serve as "eyes and ears" for language models, providing them with up-to-date, personal, or specific information that is not available in their training data. These plugins also enable language models to perform safe and constrained actions at a user's request, improving the overall usefulness of the system.

Open standards are expected to emerge for AI-facing interfaces, and efforts are underway to develop an early version of such a standard. Currently, existing plugins from early collaborators are being enabled for ChatGPT users, starting with ChatGPT Plus subscribers. Developers are also being given the ability to create their own plugins for ChatGPT.

Browsing

Drawing inspiration from previous works such as WebGPT, GopherCite, BlenderBot2, LaMDA2, and others, browsing-enabled language models can access information from the internet, expanding their knowledge beyond their training corpus. This allows ChatGPT to discuss up-to-date information and provide more relevant answers to user queries.

Enabling language and chat models to conduct thorough and interpretable research on the internet has promising prospects for scalable alignment.

Code Interpreter

The code interpreter provides ChatGPT with a working Python interpreter in a sandboxed, firewalled execution environment, along with ephemeral disk space. Code run by the interpreter plugin is evaluated in a persistent session that is active for the duration of a chat conversation. Users can upload files to the workspace and download the results of their work.

The integration of a code interpreter aims to create a natural interface for computer capabilities, making workflows more efficient and accessible. Initial user studies have identified several use cases for the code interpreter:

  • Solving mathematical problems
  • Performing data analysis and visualization
  • Converting files between formats

Retrieval

The open-source retrieval plugin allows ChatGPT to access personal or organizational information sources, such as files, notes, emails, or public documentation, with user permission. By asking questions or expressing needs in natural language, users can obtain relevant document snippets from their data sources.

Developers can deploy their own version of the plugin, which leverages OpenAI embeddings and integrates with various vector databases (Milvus, Pinecone, Qdrant, Redis, Weaviate, or Zilliz) for indexing and searching documents. Information sources can be synchronized with the database using webhooks.

To get started, developers can visit the retrieval plugin repository.

Third-Party Plugins

Third-party plugins are defined by a manifest file that includes a machine-readable description of the plugin's capabilities, invocation instructions, and user-facing documentation. The process of creating a plugin involves the following steps:

  1. Develop an API with endpoints that the language model can call. This can be a new API, an existing API, or a wrapper around an existing API specifically designed for language models.
  2. Create an OpenAPI specification to document the API and a manifest file that links to the OpenAPI spec and includes plugin-specific metadata.

Users can select which third-party plugins they want to enable when starting a conversation on chat.openai.com. The language model is shown documentation about the enabled plugins as part of the conversation context, allowing the model to invoke appropriate plugin APIs as needed to fulfill user intent. While current plugins are designed for calling backend APIs, the exploration of plugins capable of calling client-side APIs is ongoing.