OpenAI Harmony
Last reviewed
Jun 3, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,356 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,356 words
Add missing citations, update stale details, or suggest a clearer explanation.
OpenAI Harmony (the harmony response format) is the structured conversation and response format used by OpenAI's open-weight gpt-oss language models, released on August 5, 2025. The gpt-oss models were trained on this format to define conversation structure, generate reasoning output, and structure function calls, and OpenAI's documentation states that the models "should only be used with this format; otherwise, they will not work correctly." [1][2] Alongside the models, OpenAI published an open-source renderer library, also named harmony, that converts conversations into the token sequences the models expect and parses the models' output back into structured messages. [3]
The harmony format specifies how messages are laid out as sequences of tokens before they are fed to a gpt-oss model, and how the raw token output of the model is interpreted. It is deliberately modeled on the OpenAI Responses API, so that developers familiar with OpenAI's hosted products encounter a similar structure: a clear hierarchy of message roles, support for tool calling and structured outputs, and a separation between the model's internal reasoning and its user-facing answer. [3]
Because gpt-oss was tokenized with a tokenizer that includes the format's control tokens (a superset of the tokenizer used for o4-mini and GPT-4o, which OpenAI named o200k_harmony and also open-sourced), the model emits these tokens directly during generation. [2] In practice, developers who access gpt-oss through an inference provider or API (for example Hugging Face, Ollama, or vLLM) usually do not have to construct the format by hand, because the serving layer applies it automatically. Developers who run the raw model weights themselves are responsible for rendering prompts and parsing completions in the harmony format, which is the gap the harmony library fills. [1][3]
A harmony conversation is a list of messages. Each message has a role, an optional channel, optional recipient information (used for tool calls), and content. The format defines five roles, and OpenAI documents an explicit information hierarchy, sometimes called a chain of command, in which instructions from higher-priority roles take precedence over lower-priority ones when they conflict. [3]
| Role | Priority | Purpose |
|---|---|---|
| system | Highest | Specifies reasoning effort, meta information such as the knowledge cutoff and current date, the list of valid channels, and built-in tools. |
| developer | 2 | Provides the model's instructions (the "system prompt" in common usage) and the available function tools. |
| user | 3 | Carries the input from the end user to the model. |
| assistant | 4 | The model's own output, which may be a message or a tool call. |
| tool | Lowest | Carries the output returned from a tool call back to the model. |
The documented priority order is system, then developer, then user, then assistant, then tool. [3] This hierarchy is closely related to OpenAI's broader work on instruction following described in the Model Spec.
The system message in harmony begins with a fixed identity line, "You are ChatGPT, a large language model trained by OpenAI," followed by metadata such as the knowledge cutoff, the current date, and the selected reasoning effort, which can be set to low, medium, or high. [1][2] Developer instructions and function tool definitions are carried in the developer message rather than the system message. Function tools are declared in the developer message using a TypeScript-like type syntax inside a functions namespace. [3]
A distinctive feature of harmony is that the assistant's output is split into named channels, so that internal reasoning, tool-call preambles, and the final answer can be distinguished from one another. Every assistant message must specify a channel. The format defines three channels. [3]
| Channel | Purpose |
|---|---|
| analysis | Carries the model's chain-of-thought (CoT) reasoning. |
| commentary | Carries function tool calls and, occasionally, preambles describing an upcoming sequence of tool calls. |
| final | Carries the message intended to be shown to the end user. |
OpenAI's documentation includes an explicit safety caveat for the analysis channel: "Messages in the analysis channel do not adhere to the same safety standards as final messages do," which means raw chain-of-thought content should not be surfaced directly to end users without consideration. [3] Function tool calls are required to be emitted on the commentary channel. [1][3]
Harmony delimits messages and their components with a set of reserved control tokens rather than ordinary text. The tokens carry fixed token IDs in the o200k_harmony tokenizer. The following are documented in OpenAI's harmony guide. [3]
| Token | Token ID | Purpose |
|---|---|---|
<|start|> | 200006 | Marks the beginning of a message and introduces its header (role and, where applicable, recipient and channel). |
<|end|> | 200007 | Marks the end of a message. |
<|message|> | 200008 | Marks the transition from a message header to its actual content. |
<|channel|> | 200005 | Introduces the channel information within a header. |
<|constrain|> | 200003 | Indicates the data type used in a tool call (for example a JSON-constrained argument). |
<|return|> | 200002 | A stop token indicating the model has finished sampling its response. |
<|call|> | 200012 | A stop token indicating the model wants to call a tool. |
A typical rendered message therefore takes the shape <|start|>{role}<|channel|>{channel}<|message|>{content}<|end|>. When a tool call is made, the assistant header names the recipient (for example to=functions.{tool_name}), routes to the commentary channel, and may use <|constrain|> to mark the argument type; the model then ends the message with the <|call|> token. [3] OpenAI notes a practical detail for maintaining conversation history: a trailing <|return|> token from a completed turn should be replaced with <|end|> when the message is stored and fed back in as context, while <|return|> is the appropriate target token for training examples. [3]
To make the format easier to adopt, OpenAI released an open-source library named harmony, distributed as openai-harmony on PyPI for Python and published from the same repository for Rust. [3] The majority of the rendering and parsing is implemented in Rust for performance and exposed to Python through thin pyo3 bindings, with a pure-Python wrapper that provides dataclasses and convenience APIs mirroring the Rust implementation. OpenAI describes the design goals as consistent, loss-free token-sequence handling shared between languages, high performance from the Rust core, and first-class Python support including typed stubs and test parity with the Rust suite. [3]
The library can be installed with pip install openai-harmony (or uv pip install openai-harmony), and Rust projects can depend on it directly from the GitHub repository. [3] Its role is to render a list of messages into the exact token sequence a gpt-oss model expects, and to parse the model's raw token output back into structured messages with their roles, channels, and any tool calls identified, so that application code does not have to manipulate the control tokens manually. [1][3] The Python package on PyPI targets Python 3.8 and above. [4]
Harmony was introduced together with the gpt-oss models, gpt-oss-120b and gpt-oss-20b, which OpenAI released on August 5, 2025 under the Apache 2.0 license. [2][5] The two models were trained on the harmony format, and OpenAI states they should not be used without it. [1][2] Because the format encodes the reasoning-effort setting (low, medium, or high) in the system message and separates chain-of-thought from the final answer through the analysis and final channels, it is the mechanism through which gpt-oss exposes its adjustable reasoning and tool-use behavior. [2][3] For developers who consume gpt-oss through a managed endpoint such as the OpenAI API or a third-party inference provider, the harmony format is applied by the serving layer, whereas developers running the open weights directly rely on the harmony library to apply it correctly. [1][3]