OpenAI Harmony

Developer Tools Open Source AI OpenAI

7 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

5 citations

Revision

v1 · 1,356 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

OpenAI Harmony (the harmony response format) is the structured conversation and response format used by OpenAI's open-weight gpt-oss language models, released on August 5, 2025. The gpt-oss models were trained on this format to define conversation structure, generate reasoning output, and structure function calls, and OpenAI's documentation states that the models "should only be used with this format; otherwise, they will not work correctly." ^[1]^[2] Alongside the models, OpenAI published an open-source renderer library, also named harmony, that converts conversations into the token sequences the models expect and parses the models' output back into structured messages. ^[3]

Overview

The harmony format specifies how messages are laid out as sequences of tokens before they are fed to a gpt-oss model, and how the raw token output of the model is interpreted. It is deliberately modeled on the OpenAI Responses API, so that developers familiar with OpenAI's hosted products encounter a similar structure: a clear hierarchy of message roles, support for tool calling and structured outputs, and a separation between the model's internal reasoning and its user-facing answer. ^[3]

Because gpt-oss was tokenized with a tokenizer that includes the format's control tokens (a superset of the tokenizer used for o4-mini and GPT-4o, which OpenAI named o200k_harmony and also open-sourced), the model emits these tokens directly during generation. ^[2] In practice, developers who access gpt-oss through an inference provider or API (for example Hugging Face, Ollama, or vLLM) usually do not have to construct the format by hand, because the serving layer applies it automatically. Developers who run the raw model weights themselves are responsible for rendering prompts and parsing completions in the harmony format, which is the gap the harmony library fills. ^[1]^[3]

Roles and message structure

A harmony conversation is a list of messages. Each message has a role, an optional channel, optional recipient information (used for tool calls), and content. The format defines five roles, and OpenAI documents an explicit information hierarchy, sometimes called a chain of command, in which instructions from higher-priority roles take precedence over lower-priority ones when they conflict. ^[3]

Role	Priority	Purpose
system	Highest	Specifies reasoning effort, meta information such as the knowledge cutoff and current date, the list of valid channels, and built-in tools.
developer	2	Provides the model's instructions (the "system prompt" in common usage) and the available function tools.
user	3	Carries the input from the end user to the model.
assistant	4	The model's own output, which may be a message or a tool call.
tool	Lowest	Carries the output returned from a tool call back to the model.

The documented priority order is system, then developer, then user, then assistant, then tool. ^[3] This hierarchy is closely related to OpenAI's broader work on instruction following described in the Model Spec.

The system message in harmony begins with a fixed identity line, "You are ChatGPT, a large language model trained by OpenAI," followed by metadata such as the knowledge cutoff, the current date, and the selected reasoning effort, which can be set to low, medium, or high. ^[1]^[2] Developer instructions and function tool definitions are carried in the developer message rather than the system message. Function tools are declared in the developer message using a TypeScript-like type syntax inside a functions namespace. ^[3]

Channels

A distinctive feature of harmony is that the assistant's output is split into named channels, so that internal reasoning, tool-call preambles, and the final answer can be distinguished from one another. Every assistant message must specify a channel. The format defines three channels. ^[3]

Channel	Purpose
analysis	Carries the model's chain-of-thought (CoT) reasoning.
commentary	Carries function tool calls and, occasionally, preambles describing an upcoming sequence of tool calls.
final	Carries the message intended to be shown to the end user.

OpenAI's documentation includes an explicit safety caveat for the analysis channel: "Messages in the analysis channel do not adhere to the same safety standards as final messages do," which means raw chain-of-thought content should not be surfaced directly to end users without consideration. ^[3] Function tool calls are required to be emitted on the commentary channel. ^[1]^[3]

Special tokens

Harmony delimits messages and their components with a set of reserved control tokens rather than ordinary text. The tokens carry fixed token IDs in the o200k_harmony tokenizer. The following are documented in OpenAI's harmony guide. ^[3]

Token	Token ID	Purpose
`<\|start\|>`	200006	Marks the beginning of a message and introduces its header (role and, where applicable, recipient and channel).
`<\|end\|>`	200007	Marks the end of a message.
`<\|message\|>`	200008	Marks the transition from a message header to its actual content.
`<\|channel\|>`	200005	Introduces the channel information within a header.
`<\|constrain\|>`	200003	Indicates the data type used in a tool call (for example a JSON-constrained argument).
`<\|return\|>`	200002	A stop token indicating the model has finished sampling its response.
`<\|call\|>`	200012	A stop token indicating the model wants to call a tool.

A typical rendered message therefore takes the shape <|start|>{role}<|channel|>{channel}<|message|>{content}<|end|>. When a tool call is made, the assistant header names the recipient (for example to=functions.{tool_name}), routes to the commentary channel, and may use <|constrain|> to mark the argument type; the model then ends the message with the <|call|> token. ^[3] OpenAI notes a practical detail for maintaining conversation history: a trailing <|return|> token from a completed turn should be replaced with <|end|> when the message is stored and fed back in as context, while <|return|> is the appropriate target token for training examples. ^[3]

Tooling and library

To make the format easier to adopt, OpenAI released an open-source library named harmony, distributed as openai-harmony on PyPI for Python and published from the same repository for Rust. ^[3] The majority of the rendering and parsing is implemented in Rust for performance and exposed to Python through thin pyo3 bindings, with a pure-Python wrapper that provides dataclasses and convenience APIs mirroring the Rust implementation. OpenAI describes the design goals as consistent, loss-free token-sequence handling shared between languages, high performance from the Rust core, and first-class Python support including typed stubs and test parity with the Rust suite. ^[3]

The library can be installed with pip install openai-harmony (or uv pip install openai-harmony), and Rust projects can depend on it directly from the GitHub repository. ^[3] Its role is to render a list of messages into the exact token sequence a gpt-oss model expects, and to parse the model's raw token output back into structured messages with their roles, channels, and any tool calls identified, so that application code does not have to manipulate the control tokens manually. ^[1]^[3] The Python package on PyPI targets Python 3.8 and above. ^[4]

Relationship to gpt-oss

Harmony was introduced together with the gpt-oss models, gpt-oss-120b and gpt-oss-20b, which OpenAI released on August 5, 2025 under the Apache 2.0 license. ^[2]^[5] The two models were trained on the harmony format, and OpenAI states they should not be used without it. ^[1]^[2] Because the format encodes the reasoning-effort setting (low, medium, or high) in the system message and separates chain-of-thought from the final answer through the analysis and final channels, it is the mechanism through which gpt-oss exposes its adjustable reasoning and tool-use behavior. ^[2]^[3] For developers who consume gpt-oss through a managed endpoint such as the OpenAI API or a third-party inference provider, the harmony format is applied by the serving layer, whereas developers running the open weights directly rely on the harmony library to apply it correctly. ^[1]^[3]

References

OpenAI, "openai/harmony" (GitHub repository, "Renderer for the harmony response format to be used with gpt-oss"). https://github.com/openai/harmony ↩
OpenAI, "Introducing gpt-oss" (August 5, 2025). https://openai.com/index/introducing-gpt-oss/ ↩
OpenAI, "OpenAI Harmony Response Format" (OpenAI Cookbook). https://cookbook.openai.com/articles/openai-harmony ↩
"openai-harmony" (PyPI). https://pypi.org/project/openai-harmony/ ↩
OpenAI, "openai/gpt-oss" (GitHub repository). https://github.com/openai/gpt-oss ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

gpt-oss

Overview

Roles and message structure

Channels

Special tokens

Tooling and library

Relationship to gpt-oss

References

Improve this article

Related Articles

OpenAI Codex CLI

GPT API

Gym (OpenAI Gym / Gymnasium)

OpenAI Agents SDK

OpenAI Responses API

OpenAI Codex