OpenAI o3-pro
Last reviewed
May 31, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 · 3,803 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 · 3,803 words
Add missing citations, update stale details, or suggest a clearer explanation.
OpenAI o3-pro is a high-compute reasoning large language model released by OpenAI on June 10, 2025, designed as the professional, higher-reliability variant of the company's o3 reasoning model.[1][2] Like its predecessor o1 pro mode, o3-pro is a version of OpenAI's flagship reasoning model that is configured to "think longer" by allocating substantially more test-time compute to each query, with the explicit design goal of producing more reliable answers on difficult problems in mathematics, science, programming, and other domains where reviewers were observed to prefer the high-compute variant.[1][3][4] At launch, o3-pro became available immediately to ChatGPT Pro and Team subscribers, with ChatGPT Enterprise and Education tiers gaining access one week later, and was simultaneously released in the OpenAI Responses API at a price of US$20 per million input tokens and US$80 per million output tokens, an 87% reduction relative to the prior o1-pro API pricing.[1][2][5][6]
The launch of o3-pro was paired with an unrelated but simultaneous 80% price cut to the standard o3 model in the API, bringing baseline o3 down from US$10 / US$40 per million tokens to US$2 / US$8 per million tokens (input/output).[5][6][7] OpenAI described the o3 price reduction as a result of inference-stack optimization without any change to the underlying model.[5] The combined announcements, a substantially cheaper baseline o3 plus a higher-compute o3-pro tier, were widely characterized by analysts as part of OpenAI's strategy to widen its commercial lead in the reasoning model category.[7]
In the consumer ChatGPT product, o3-pro replaced o1 pro mode in the model picker for Pro and Team users; the model remained the most capable reasoning option available through the ChatGPT interface until OpenAI rolled out GPT-5 in August 2025, at which point o3-pro chats in the consumer product were automatically migrated to a new "GPT-5 Pro" mode and the o3-pro name was removed from the ChatGPT model picker.[8][9] o3-pro remained accessible via the OpenAI API after the GPT-5 launch, in line with OpenAI's stated policy of not deprecating older models on the API side at that time.[8]
OpenAI's "pro" model line emerged from research into test-time compute scaling, in which a reasoning model's accuracy on difficult problems is increased by allowing it to generate longer internal chain-of-thought traces and, in the "pro" configurations, by sampling or aggregating multiple candidate solutions at inference time.[4][10] OpenAI's "o-series" reasoning models, beginning with the o1-preview release in September 2024 and continuing with o1, o3-mini, o3, and o4-mini, are trained with reinforcement learning to perform extended private reasoning before producing a final answer.[11][4]
The first "pro" variant, o1 pro mode, was launched on December 5, 2024 as the headline feature of the new US$200 / month ChatGPT Pro subscription tier introduced during OpenAI's "12 Days of OpenAI" event.[12][13] o1 pro mode was described by OpenAI as a version of o1 "that uses more compute to think harder and provide even better answers to the hardest problems," and OpenAI reported a pass rate of 86% on the AIME 2024 mathematics competition for o1 pro mode versus 78% for standard o1.[12][13] o1 pro mode was initially available only inside ChatGPT; an API endpoint for the o1-pro model was added in March 2025.[11]
OpenAI announced the o3 family at the end of December 2024 as the successor to o1, and released the standard o3 and o4-mini models for ChatGPT users on April 16, 2025.[11][14] After the public release of o3, OpenAI CEO Sam Altman confirmed in mid-April 2025 that an o3-pro variant was planned for the ChatGPT Pro tier "in a few weeks."[14] In the months between the o3 launch and the o3-pro release, OpenAI prepared the higher-compute variant alongside continued availability of o1 pro mode in the ChatGPT Pro picker.[1][11]
The o3-pro launch on June 10, 2025 was announced via OpenAI's developer-facing channels and the ChatGPT release notes, accompanied by tweets from CEO Sam Altman noting both the 80% price cut for standard o3 and the new o3-pro pricing.[5][15] Altman wrote on X: "we dropped the price of o3 by 80%!! excited to see what people will do with it now. think you'll also be happy with o3-pro pricing for the performance".[15]
OpenAI has not published an architecture-level paper specific to o3-pro, and the available system card documentation covers the underlying o3 and o4-mini models rather than the pro variant.[16] At a high level, o3-pro is described by OpenAI as "a version of our most intelligent model, OpenAI o3, designed to think longer and provide the most reliable responses,"[1] indicating that o3-pro shares the underlying weights or model family of o3 but is operated under a configuration that consumes substantially more test-time compute per request.[4][17]
Like other o-series models, o3-pro is trained with reinforcement learning to perform private chain-of-thought reasoning before producing a user-facing answer.[11][17] OpenAI's standard explanation for the o-series describes a process in which the model produces a long internal reasoning trace, is rewarded during training on the correctness of its final answers, and develops behaviors such as self-correction, backtracking, and explicit problem decomposition. In o3-pro, this is paired with a higher effective compute budget; analyses of the model's behavior have suggested it samples or evaluates multiple candidate reasoning paths and selects the strongest, a strategy sometimes described as inference-time search or majority voting over chains of thought.[4][18]
The "4/4 reliability" evaluation that OpenAI used to highlight o3-pro's improvements measures whether a model answers the same question correctly across four independent samples; on this metric o3-pro outperformed both standard o3 and o1 pro mode in OpenAI's internal evaluations, reflecting a focus on consistency rather than peak single-attempt accuracy.[1][3][18]
In the OpenAI API, o3-pro is exposed exclusively through the Responses API, which supports multi-turn reasoning interactions and is described by OpenAI as the recommended interface for future-extensible features such as long-running tasks and structured tool use.[19][20] Streaming is not supported on o3-pro at launch, and OpenAI recommends running o3-pro requests in "background mode," in which the API kicks off a request asynchronously and the client polls for completion, in order to avoid timeouts on long reasoning runs that may take several minutes.[19][20]
Independent measurements of o3-pro's runtime characteristics on the Artificial Analysis benchmark service reported a median output rate of approximately 32.6 tokens per second, at the lower end of reasoning models in its price tier, and a time-to-first-token of approximately 107.85 seconds, reflecting the long internal reasoning phase before any output is streamed.[21]
On June 10, 2025, o3-pro was made available inside ChatGPT to:
ChatGPT Plus subscribers (US$20 / month) and ChatGPT Free users were not given access to o3-pro at launch; Plus tier access was confined to standard o3, o4-mini, and o4-mini-high.[22][23] The exclusivity of o3-pro to the US$200 Pro tier and above was widely reported in the trade press as part of OpenAI's tier-segmentation strategy following the December 2024 introduction of ChatGPT Pro.[22][12]
Inside ChatGPT, o3-pro retained access to the toolset that the consumer product exposes to other o-series models, including web search, file analysis, Python execution (advanced data analysis), vision-based reasoning over uploaded images, and memory-based personalization.[1][24] At launch, however, several ChatGPT features were not supported by o3-pro: image generation (which routed users to other models), the Canvas workspace feature, and temporary chats, the latter being disabled while OpenAI resolved what it described as a technical issue.[1][2][24]
The o3-pro API endpoint went live on June 10, 2025 alongside the ChatGPT release.[5][6] OpenAI published the following list pricing for the model:[5][6][21][25]
| Model | Input (USD / 1M tokens) | Output (USD / 1M tokens) |
|---|---|---|
| OpenAI o3-pro | 20.00 | 80.00 |
| OpenAI o3 (post–June 10, 2025) | 2.00 | 8.00 |
| OpenAI o1-pro (prior list price) | 150.00 | 600.00 |
The o3-pro list price represented an approximately 87% reduction relative to the o1-pro list price it succeeded.[6][7] Standard o3 saw an 80% reduction on the same day, which OpenAI described as the result of inference-stack optimization rather than a model change.[5]
The API model supports a 200,000-token context window and a maximum output of approximately 100,000 tokens per request, with input modalities that include text and images and a text-only output modality.[25][26][27] OpenAI's API documentation lists support for function calling and structured outputs, and notes that streaming is not supported; the published knowledge cutoff for o3-pro is May 31, 2024 according to third-party model cards mirroring OpenAI's documentation, though documented sources vary slightly on the cutoff date.[25][26][27]
Microsoft made o3-pro available through the Azure OpenAI Service (Azure AI Foundry) starting June 19, 2025, initially in the East US 2 and Sweden Central regions and supporting Global Standard deployment. Azure customers were required to request approval to use the model.[28] Azure list pricing matched OpenAI's direct API pricing of US$20 / US$80 per million input/output tokens, and o3-pro on Azure supported the same Responses API surface with text and image inputs, function calling, structured outputs, and integration with Azure's File Search tool.[28]
OpenAI positioned o3-pro as the most reliable model in the o3 family, recommending it for "challenging questions where reliability matters more than speed, and waiting a few minutes is worth the tradeoff."[1][3] The product positioning emphasized three distinct capability dimensions:
Like baseline o3, o3-pro can use the full set of ChatGPT tools as part of its reasoning, including:[1][24]
In the API, o3-pro supports function calling and structured outputs via the Responses API and integrates with file-search tools on cloud-deployment surfaces such as Azure AI Foundry.[19][28]
o3-pro accepts both text and image inputs in ChatGPT and through the API.[25][28] A third-party multimodal evaluation by Roboflow placed o3-pro joint-third on its Vision AI Checkup leaderboard at the time of its release, with strong performance on optical character recognition (e.g., reading barcodes and serial numbers from images), spatial reasoning, and visual question answering, and weaker performance on counting and precise measurement tasks.[29]
At launch, several features supported by other ChatGPT models were not available within o3-pro:[1][2][24]
OpenAI's published benchmark data for o3-pro at launch consisted primarily of internal "pass@1" and "4/4 reliability" comparisons on a small number of well-known reasoning benchmarks. Third-party reproductions and aggregations have largely confirmed the relative ordering of the numbers, although exact scores reported by different sources vary slightly. The following numbers were reported in OpenAI's announcement materials and in coverage by reasoning-model trackers.[4][3][18][30]
On the AIME 2024 mathematics competition benchmark, o3-pro was reported at 93% on pass@1 and 90% under the stricter 4/4 reliability evaluation.[18][30] Aggregator coverage placed o3-pro at approximately 94% on AIME 2024, ahead of Google's Gemini 2.5 Pro (~92% in OpenAI's comparison materials) on the same benchmark.[4][30][3] Standard o3, by way of comparison, was variously reported by third-party trackers at approximately 90–96.7% on AIME 2024 depending on configuration.[31]
On GPQA Diamond, a benchmark of graduate-level multiple-choice science questions, o3-pro scored 84% on pass@1 and 76% under 4/4 reliability.[18][30] On the same benchmark, third-party comparisons reported Claude Opus 4 at 83.3% and Gemini 2.5 Pro at approximately 84%, placing o3-pro near the top of the GPQA Diamond leaderboard among reasoning models available in June 2025.[4][30]
On Codeforces, o3-pro recorded a pass@1 Elo rating of 2,748 and a 4/4 reliability Elo of 2,301.[18][30] The pass@1 figure was reported by external analyses as corresponding to roughly the 159th-highest active rating on the platform.[4][30]
OpenAI's signature evaluation for o3-pro was the "4/4 reliability" framework, in which a model is judged correct on a question only if it independently produces a correct answer in four out of four sampling attempts. On this evaluation, OpenAI reported that o3-pro outperformed both standard o3 and o1 pro mode across all four benchmarks tested.[1][3][18]
OpenAI also reported the results of human evaluations comparing o3-pro and standard o3 across multiple categories. Reviewers preferred o3-pro to o3 in every category tested, with an aggregated win rate of approximately 64% in favor of o3-pro; OpenAI specifically called out science, education, programming, business, and writing help as domains where the preference for o3-pro was strongest.[1][4][32]
Third-party analyses of o3-pro on the ARC-AGI benchmark (which measures abstract pattern reasoning) suggested that o3-pro did not meaningfully outperform standard o3 on the ARC-AGI tasks despite consuming substantially more compute. Reviewers reported that the cost-per-task was approximately ten times higher than for standard o3 without a proportional accuracy gain, making o3-pro a poor cost/performance choice on this particular benchmark.[4]
The Artificial Analysis "Intelligence Index" placed o3-pro at 41, above the median (35) for reasoning models in its price tier at the time of measurement, while flagging that o3-pro was "particularly expensive when comparing to other models of similar price."[21]
Early reviews highlighted o3-pro's strength in long, context-heavy analytical tasks where reliability and depth of reasoning are valued over latency. A widely cited early review by Ben Hylak on the Latent Space newsletter, headlined "God is hungry for Context: First thoughts on o3 pro," argued that o3-pro's principal value showed up not in conversational use but in extended-report generation. After supplying o3-pro with comprehensive company background (meeting notes, prior plans, voice memos, and product context), the reviewer reported that o3-pro produced "specific and rooted" strategic plans that "actually changed how we are thinking about our future," and described the model as exhibiting better tool-selection and orchestration judgment than alternatives such as Claude Opus 4 and Gemini 2.5 Pro.[17]
OpenAI's own positioning emphasized that o3-pro was preferred over standard o3 in every tested category, with reviewers especially favoring it on math, science, coding, business, and writing.[1][3] Industry analysts characterized o3-pro as a "game-changer" for startups and small-to-medium businesses seeking access to frontier reasoning capabilities at substantially lower cost than the previous o1-pro tier.[7]
Mixed feedback emerged in the weeks after launch, focused on three themes:[32][33]
A separate strand of criticism, from cost-sensitive developers, pointed out that on cost/performance metrics, particularly on ARC-AGI tasks, o3-pro's roughly 10× higher per-task cost relative to standard o3 was not justified by corresponding accuracy gains on those benchmarks.[4]
OpenAI and reviewers identified several application areas where o3-pro's combination of long reasoning, tool use, and reliability gains were most useful:[1][17][28]
Beyond the missing ChatGPT features (image generation, Canvas, temporary chat) and the long latencies described above, o3-pro has several documented limitations:
OpenAI rolled out GPT-5 in early August 2025. With the GPT-5 launch, OpenAI initially removed o3-pro (along with GPT-4o, o3, o4-mini, and several other models) from the ChatGPT consumer model picker; existing o3-pro chats inside ChatGPT were migrated to a new "GPT-5 Pro" mode, which OpenAI made available only on the Pro and Team tiers.[8][9] Following user backlash over the abrupt removal of older models from ChatGPT, OpenAI restored access to some of the prior generation in the consumer product for a subset of users, with CEO Sam Altman acknowledging that the GPT-5 rollout had been "more bumpy than we hoped for."[8]
On the API side, OpenAI told VentureBeat at the time of the GPT-5 launch that the company had no plans to deprecate older models, with a spokesperson stating that "in the API, we do not currently plan to deprecate older models."[8] As a result, o3-pro remained accessible to API developers after the GPT-5 launch in the consumer ChatGPT app, even as the o3-pro name no longer appeared in the consumer product's model picker.