OpenAI AgentKit
Last reviewed
May 16, 2026
Sources
30 citations
Review status
Source-backed
Revision
v1 ยท 3,990 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
30 citations
Review status
Source-backed
Revision
v1 ยท 3,990 words
Add missing citations, update stale details, or suggest a clearer explanation.
| OpenAI AgentKit | |
|---|---|
| Overview | |
| Type | Agent development platform |
| Developer | OpenAI |
| Announced | 6 October 2025 |
| Venue | OpenAI DevDay 2025, Fort Mason, San Francisco |
| Pricing | Included with standard API model pricing |
| Key components | Agent Builder, ChatKit, Connector Registry, evals for agents |
| Companion releases | Reinforcement fine-tuning on GPT-5 (private beta), Apps SDK |
| Predecessor | Assistants API (deprecation set for 26 August 2026) |
| Successor protocol | Responses API, Agents SDK |
| Status | Mix of generally available and beta components |
| Demo | Two agents and a workflow built live in under eight minutes by OpenAI engineer Christina Huang |
OpenAI AgentKit is a suite of agent-building tools introduced by OpenAI at OpenAI DevDay on 6 October 2025. The suite is positioned as a single platform for designing, deploying, evaluating, and embedding AI agents that run on OpenAI's hosted models. It bundles four headline products: Agent Builder, a visual canvas for composing multi-agent workflows; ChatKit, an embeddable chat interface; Connector Registry, a central admin layer for data and tool integrations; and an expanded evals system for agents, including dataset management, trace grading, and automated prompt optimisation. Reinforcement fine-tuning, announced as generally available on the o4-mini reasoning model and as a private beta on GPT-5, was bundled into the same DevDay narrative even though it is technically a separate training product.
AgentKit is intended as the production-grade replacement for OpenAI's earlier developer experience, which combined the deprecated Assistants API with hand-written orchestration code and ad hoc evaluation scripts. OpenAI chief executive Sam Altman framed the launch in keynote remarks as the platform OpenAI "wished we had when we were trying to build our first agents", and the company published Klarna, Albertsons, Ramp, and HubSpot as launch customers running agents on the new stack. AgentKit shipped with mixed availability. ChatKit and the new evals capabilities entered general availability on launch day. Agent Builder opened in public beta. Connector Registry began a staged beta rollout to API customers, ChatGPT Enterprise, and ChatGPT Edu administrators with access to OpenAI's Global Admin Console.
The suite has been read by industry observers as OpenAI's most explicit move into the agent-platform market dominated by code-first frameworks such as LangGraph, Mastra, Vercel AI SDK, and Agno, and by no-code automation tools such as Zapier and n8n. AgentKit does not attempt to replace any of those frameworks for power users; rather it tries to provide a faster default path from prototype to production for teams that have standardised on OpenAI models. Early developer reaction has been mixed, with praise for Agent Builder's visual model and criticism of integration depth and lock-in.
OpenAI's agent strategy before AgentKit was scattered across several incompatible surfaces. The original Assistants API, released in beta at DevDay 2023, packaged a model, system instructions, tools, and a thread of messages into a server-side object that developers could persist and re-use. It was widely adopted for chat-style agents but never reached general availability. In March 2025 OpenAI introduced the Responses API and the Agents SDK, a Python-first library (with a TypeScript counterpart added later) that took a more granular, function-call-centric view of agent composition. The Assistants API was formally placed on a deprecation path on 26 August 2025, with a sunset date of 26 August 2026. Teams running on Assistants were instructed to migrate to Responses, while teams that had hand-rolled orchestration on the Chat Completions API were given the Agents SDK as a recommended path.
This left a gap. Production agents typically need more than a model and an SDK. They need a way to express multi-step logic, to expose a chat interface, to call into business systems, to be tested against regression cases, and to be governed by an administrator. OpenAI's customers had been assembling those pieces from open-source frameworks, custom React components, ad hoc connectors, and home-grown eval scripts. AgentKit is OpenAI's attempt to consolidate that stack into a single platform that runs on its hosted infrastructure and bills through the existing API meter.
The broader market context for the launch was a shift in attention from chatbots to agents during 2025. Frameworks such as LangGraph (Python), Mastra (TypeScript), and Agno (Python) grew rapidly, while workflow tools that pre-dated the agent era, including Zapier, n8n, and Make, added LLM integrations. The question OpenAI implicitly answered with AgentKit was whether the model provider itself should ship a first-party platform, with everything that implies for portability and lock-in.
AgentKit is best understood as four products plus a fine-tuning capability that was announced alongside it on the same DevDay stage.
| Component | Function | Availability at launch | Self-host | Pricing |
|---|---|---|---|---|
| Agent Builder | Visual canvas for composing multi-agent workflows, with drag-and-drop nodes, conditional logic, guardrails, and version control | Public beta | No | API usage only; design is free |
| ChatKit | Embeddable React chat component with streaming, threading, attachments, widgets, and theme controls | Generally available | Yes, via advanced self-hosted option | API usage plus storage and tool overage beyond free tier |
| Connector Registry | Admin console for managing data-source and tool connections across ChatGPT and the API, including third-party Model Context Protocol servers | Beta rollout to API, ChatGPT Enterprise, and ChatGPT Edu customers with the Global Admin Console | No | Included; underlying connectors may be billed separately |
| Evals for agents | Datasets, trace grading, automated prompt optimisation, and support for grading non-OpenAI models | Generally available | No | Included with API |
| Reinforcement fine-tuning (RFT) | Programmable-grader fine-tuning method that adjusts a reasoning model's weights toward high-scoring outputs | GA on o4-mini; private beta on GPT-5 | No | Per-token training and inference pricing |
Agent Builder and ChatKit are typically the visible parts of an AgentKit deployment. Connector Registry sits behind the scenes, controlling which tools a workflow can call. Evals run on the same traces that the agents emit during execution, which means that the data needed to grade an agent is captured automatically and does not have to be re-instrumented.
Agent Builder is the centrepiece of AgentKit and the demo subject of the DevDay keynote. It is a browser-based visual canvas in which a developer assembles an agent workflow as a directed graph of nodes. Each node represents either a step of model reasoning, a tool call, a guardrail check, a branching decision, a loop, a sub-agent, or a human approval gate. Nodes are connected by typed inputs and outputs that reference earlier results with a double-brace syntax such as {{input.output_text}}. The canvas supports inline preview runs, version control, and an inline evals view that scores the workflow against a stored dataset without leaving the editor.
Node categories surfaced in OpenAI's published guides include extraction nodes that turn unstructured input into structured fields, analysis nodes that reason over those fields, action nodes that call out to connectors, guardrail nodes that screen inputs and outputs for personally identifiable information or policy violations, and human approval nodes that pause the run until a person clicks accept or reject. Builder templates ship for common patterns such as customer support triage, document comparison, knowledge-base question answering, and data enrichment. Workflows can be exported to the Agents SDK as code, so a team that starts in the visual canvas can hand the resulting graph to engineers who prefer to maintain it in source control.
During the DevDay keynote, OpenAI engineer Christina Huang assembled a workflow and two agents on stage in under eight minutes, a demo that has since been widely cited as evidence of the platform's intended ease of use. Ramp, a launch customer, said in OpenAI's announcement that it had built a procurement agent from a blank canvas "in a couple of hours" and cut iteration cycles by roughly 70 percent. Albertsons, the United States grocery chain, presented an Agent Builder workflow that flagged a 32 percent drop in ice cream sales and proposed display and advertising adjustments after analysing seasonality and external factors. Klarna's support agent, announced earlier in 2025, was reframed at DevDay as handling roughly two-thirds of the company's support tickets after migration onto the new stack. HubSpot used AgentKit to extend its Breeze assistant with a custom response widget that pulls policy and location-specific data from multiple internal sources.
Guardrails are a distinct node type rather than a configuration setting. A guardrail has an input channel, a logic unit, and two output branches that route execution to pass or fail downstream behaviour. The published guardrail library covers detection of personally identifiable information, jailbreak prompts, and other content-policy concerns. Custom guardrails can be added with business-specific rules, such as budget ceilings on an agent that allocates ad spend.
Limitations called out by reviewers include cloud-only deployment, lack of native execution on non-OpenAI models, and a smaller catalogue of pre-built connectors than legacy automation tools such as Zapier (which advertises more than 5,000 integrations) or n8n.
ChatKit is the embeddable user-interface layer of AgentKit. It is published as a framework-agnostic web component with a React-friendly wrapper, and developers add it to an application by including a small script and exchanging a client secret for a ChatKit session bound to a workflow. The component handles message rendering, streaming responses, conversation threading, file attachments, tool invocations, and visualisation of the model's intermediate reasoning steps. OpenAI's published quickstart describes a five-minute path from a fresh project to a working embedded chat surface.
Features called out in the ChatKit documentation include light and dark themes, accent-colour and density controls, a hosted upload strategy for file attachments, an @-mention system for custom entities, progress events for long-running tools (which are replaced by the next assistant message or widget output once the tool completes), and a widget system that lets the agent display custom UI elements with interactive buttons and form fields. Widget actions are bound to events on widget nodes and handled by either server-side or client-side logic, so a button click in a chat bubble can trigger either an additional model turn or an in-page state update.
ChatKit is designed primarily for in-app embedding (in product chat, support widgets, internal tools) rather than as a standalone consumer surface. For customers that need on-premises deployment, custom authentication, data residency control, or custom orchestration logic, OpenAI exposes an advanced self-hosted option that ships the ChatKit components but lets the customer run them on their own infrastructure.
Reception of ChatKit has been more uniformly positive than reception of Agent Builder. Reviewers consistently describe the component as the most polished and least controversial piece of the AgentKit suite, in part because it makes a smaller claim: it replaces a chat UI rather than a workflow framework.
Connector Registry is the administrative layer that controls how AgentKit (and ChatGPT) reach data and tools. Before AgentKit, an enterprise customer typically had to manage two parallel sets of connectors: one for ChatGPT's built-in connectors (used by ChatGPT Business, Enterprise, and Edu tenants for retrieval into the chat surface) and one for the API (used by hand-rolled agents). Connector Registry collapses those into a single admin console.
The console surfaces a curated catalogue of pre-built connectors. OpenAI's published list at launch included Dropbox, Google Drive, OneDrive, Microsoft SharePoint, Microsoft Teams, GitHub, Jira, Slack, and Notion. It also surfaces third-party Model Context Protocol servers, which agents can call through the same mcp built-in tool type as native connectors. The Responses API treats a Connector Registry entry and a remote MCP server symmetrically: when a connector is invoked, the developer supplies a connector_id and an access token instead of a server_url, and OpenAI handles authentication and tool discovery server-side.
The registry is rolling out in beta to API customers, ChatGPT Enterprise tenants, and ChatGPT Edu tenants that have access to OpenAI's Global Admin Console. For administrators, the value is governance: a single place to see which connectors are enabled across the organisation, to set permission scopes, and to revoke access. For developers, the value is reuse: a connector configured once for ChatGPT retrieval can also be invoked by an AgentKit workflow without re-authenticating.
MCP integration in particular places AgentKit inside the broader open ecosystem around the protocol. Customers can connect Agent Builder to any MCP-compatible server, whether published by OpenAI, an internal team, or a third party. This makes AgentKit interoperable with the wider community of MCP servers built for tools such as Figma, Zapier, Composio, Notion, and various internal databases. The interoperability cuts both ways. AgentKit benefits from the growing MCP ecosystem, and any MCP-based tool can plug into Connector Registry without bespoke work.
Reinforcement fine-tuning (RFT) is the training capability that OpenAI bundled into AgentKit's DevDay narrative. RFT adapts an OpenAI reasoning model with a feedback signal that the developer defines, rather than a static set of correct answers. The training loop relies on a programmable grader that scores every candidate response, and the algorithm then shifts the model's weights so high-scoring outputs become more likely over time.
RFT first appeared as an alpha during OpenAI's 12 Days of OpenAI series in December 2024. It became generally available on o4-mini in May 2025 for verified organisations. At DevDay 2025 OpenAI re-framed RFT as part of the AgentKit story, with a private beta on GPT-5 (alongside the GA on o4-mini) and a recommendation to run evals before launching an RFT job. The reasoning is straightforward: RFT only works when a model's eval scores sit between the floor and ceiling of the grader, because that range is where the gradient signal exists. If the model already scores at the maximum, RFT has nothing to learn; if it scores at the minimum, RFT cannot find a foothold either.
Reported customer results published with the launch include Accordance AI, which used RFT on tax-analysis tasks and reported a 39 percent improvement in accuracy, and Ambience Healthcare, which applied RFT to ICD-10 medical-code assignment and reported a 12-point improvement over a physician baseline.
AgentKit's evaluation features were not new in concept but were significantly expanded at DevDay. The agents-focused evals system supports four capabilities. Datasets allow developers to build evaluation sets from scratch and expand them over time with automated graders and human annotations. Trace grading records the end-to-end log of model calls, tool calls, guardrail decisions, and agent hand-offs for one workflow run, and grades each step of the trace separately. Automated prompt optimisation suggests modifications to the prompts and instructions in a workflow based on observed trace performance. Third-party model support means that Anthropic's Claude, Google's Gemini, and other non-OpenAI models can be graded alongside OpenAI models, which is useful for teams that want to compare options.
Trace grading is the most distinctive of the four, because it gives developers a way to debug agents in which the failure is not in the final answer but in an intermediate decision (such as choosing the wrong tool or mishandling a partial result). The grader can be configured to flag individual steps, and the evals UI surfaces those flags inline with the trace so a developer can click into the offending node.
AgentKit does not introduce a separate platform fee. OpenAI's official position at launch was that AgentKit tools are included with standard API model pricing, with the exception of components such as the Code Interpreter (US$0.03 per session) and File Search (US$0.10 per GB of storage per day) that already had their own line items in the API price list. Agent Builder is free to design in; the cost is incurred when a workflow runs and consumes tokens against an underlying model. ChatKit usage is similarly metered against API token consumption, with storage and tool overages charged once a tenant passes a free tier.
For reference, the underlying model prices at launch ranged from GPT-4o-mini at US$0.15 per million input tokens and US$0.60 per million output tokens, through GPT-4.1 at US$2.50 input and US$10.00 output, to GPT-5 at US$1.25 input and US$10.00 output. Reinforcement fine-tuning has its own per-token training and inference rates.
The absence of a platform fee is a deliberate marketing point. OpenAI's argument is that customers should not have to choose between using OpenAI models and using a high-quality agent platform; bundling the two removes a procurement barrier. For teams that mix providers, paying a token-based premium to run on OpenAI infrastructure may not be cheaper than running a self-hosted framework on a cheaper model.
AgentKit sits in a crowded market. The closest direct comparisons are code-first agent frameworks (LangGraph, Mastra, Agno, Vercel AI SDK) and no-code automation tools (Zapier, n8n, Make, Lindy). The table below summarises the positioning published by each project and reported in independent reviews; figures and adoption metrics are taken from the sources cited at the end of this article and reflect the state of the field as of late 2025 and early 2026.
| Platform | Primary language | Visual editor | Hosting | Model lock-in | Notable adoption |
|---|---|---|---|---|---|
| OpenAI AgentKit | TypeScript and Python via Agents SDK | Yes, Agent Builder | OpenAI cloud only | OpenAI models for execution; third-party models for grading | Klarna, Albertsons, Ramp, HubSpot at launch |
| LangGraph | Python (with TypeScript port) | No, code-first | Self-host or LangGraph Cloud | None; model-agnostic | Wide adoption across Python teams |
| Mastra | TypeScript | No, code-first | Self-host or Mastra Cloud | None; uses Vercel AI SDK | More than 19,000 GitHub stars, more than 300,000 weekly npm downloads, US$13 million raised (YC W25); production at Replit, PayPal, Sanity, Brex |
| Agno | Python | No, code-first | Self-host | None; supports OpenAI, Anthropic, Google, Ollama | Python-first teams building multi-agent systems |
| Vercel AI SDK | TypeScript | No, code-first | Self-host (typically Next.js) | None; provider-agnostic | Default agent library for Next.js teams |
| Zapier | None (web app) | Yes | Hosted | None; LLM steps optional | More than 5,000 integrations advertised |
| n8n | None (web app) | Yes | Self-host or n8n Cloud | None | Open-source; large self-host community |
The practical trade-offs differ by language and use case. Reviewers writing in late 2025 generally recommended LangGraph or Agno for Python teams that need branching, durable resumption, and complex state, and Mastra or Vercel AI SDK for TypeScript teams in the same situation. AgentKit is recommended where the team has already standardised on OpenAI models, where a visual canvas is valuable for cross-functional collaboration, and where the customer is willing to accept a cloud-only deployment. The published independent benchmark by LogRocket comparing AgentKit and n8n on an ad campaign workflow concluded that AgentKit produced deeper analysis and more contextual recommendations, while n8n still won on rigid, repetitive workflows where the path between steps was fully specified.
Mastra and Agno are sometimes treated as the closest open-source analogues to AgentKit because they bundle agents, workflows, memory, evals, and observability into a single framework rather than spreading those concerns across multiple libraries. Both, however, are code-first and do not ship a visual workflow editor of the kind that Agent Builder provides.
Reception of AgentKit has been a mixture of enthusiasm and reservation. Coverage from VentureBeat, TechCrunch, InfoQ, and Maginative at the time of the launch focused on the speed of the on-stage demo and the breadth of the suite, presenting AgentKit as OpenAI's most credible move into the agent-platform market to date. Sam Altman's framing in the keynote, that AgentKit gives developers what OpenAI's own team "wished we had" when building their first agents, was widely quoted.
Developer reaction in the weeks after the launch was more cautious. A common observation, surfaced in reviews on LogRocket, Unite.AI, and Greasy Guide, was that the gap between the eight-minute on-stage demo and a production-ready customer-support deployment is significant. Specific complaints included the limited connector catalogue compared with Zapier or n8n, the requirement to manually upload and update documents through File Search (which several reviewers called impractical for live support knowledge bases), and assorted alpha-stage bugs in the Agent Builder canvas. A LinkedIn review by Shadrack Otieno noted that AgentKit "is definitely not what people [say is an] 'n8n alternative' or 'Zapier alternative'" because the three products are positioned for different audiences.
Longer-term concerns raised by reviewers include vendor lock-in (Agent Builder workflows execute only against OpenAI models, even though evals can grade other providers), the absence of self-hosting for most components (only ChatKit has a self-hosted path), and the question of how AgentKit will evolve once the Assistants API sunset on 26 August 2026 forces a wave of migrations. Klarna's reported success with a support agent that handles two-thirds of tickets has been cited as the leading positive case study, although it pre-dates AgentKit in some respects.
Inside the broader agent-framework community, AgentKit is generally treated as a complement to, rather than a replacement for, code-first frameworks. AgentKit is positioned as the easiest path for teams that have already chosen OpenAI as their model provider and that value a visual canvas, an embeddable chat surface, and a managed connector registry.