Strawberry (OpenAI codename)

AI History OpenAI Reasoning Models

16 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

21 citations

Revision

v3 · 3,272 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Strawberry was the internal codename used at OpenAI for the research program that produced the o-series of reasoning models, most notably OpenAI o1. The project was the direct successor to an earlier internal effort known as Q* (pronounced "Q-star"), and the existence of the rebranded program became public on July 12, 2024, when Reuters reported on internal OpenAI documents describing a system aimed at large-scale reasoning, autonomous web browsing, and "deep research."^[1] Strawberry shipped under the production name "o1-preview" on September 12, 2024.^[2] As a milestone, the project is widely cited as the moment when OpenAI made test-time compute, rather than pure pretraining scale, the centerpiece of its frontier roadmap.^[3]

Infobox

Field	Value
Codename	Strawberry (formerly Q*)
Owner	OpenAI
First public report	2024-07-12 (Reuters)
Product launch	o1-preview, o1-mini on 2024-09-12
Full release	o1 (GA) and ChatGPT Pro on 2024-12-05
Successor codenames	o3 (announced 2024-12-20), o4-mini (2025-04-16)
Paradigm	Reinforcement learning on chain-of-thought traces; test-time compute
Sources	Reuters, Bloomberg, OpenAI launch posts

Background

From Q* to Strawberry

The first signal that OpenAI was building a model with novel reasoning capabilities arrived on November 22, 2023, days after the board of OpenAI had abruptly removed Sam Altman as CEO. Reuters reported that, prior to Altman's dismissal, several staff researchers had sent a letter to the board warning of a "powerful artificial intelligence discovery" associated with an internal project called Q*.^[4] According to people familiar with the matter, the model had solved certain mathematical problems at roughly the level of grade-school students, and the speed of its progress led some inside the company to interpret it as a step toward general intelligence.^[4] OpenAI's then-CTO, Mira Murati, confirmed the project's existence to staff in an internal memo and indicated that a letter about it had been sent to the board.^[4]

The Q* story circulated alongside, but did not by itself explain, Altman's removal. Reuters described the letter as "one factor among a longer list of grievances" cited by the board, and later reporting attributed the firing to broader concerns about candor and governance rather than the model alone.^[4]^[5] Altman returned as CEO on November 22, 2023, after employees signed an open letter demanding his reinstatement, and the board was reconstituted shortly afterward.^[5]

Q* itself did not vanish. It was renamed. By mid-2024, OpenAI staff and internal documentation referred to the same line of work as "Strawberry."^[1] Reporters and analysts later noted that the name was, in part, an inside joke about a familiar failure mode of large language models: when asked how many letter "r"s appear in the word "strawberry," earlier models such as GPT-4 typically miscounted because of how byte-pair tokenization split the word into pieces.^[6]^[7] The fact that the new reasoning system could in principle solve this kind of letter-counting task by spelling the word out in its private scratchpad became a small piece of community folklore around the launch.^[6]

A separate wiki entry on this site covers the Q* phase specifically; see Q* OpenAI.

The July 12, 2024 Reuters scoop

On July 12, 2024, Reuters reporter Anna Tong, with Katie Paul reporting from New York, published an exclusive describing a project "code-named Strawberry, according to a person familiar with the matter and internal documentation reviewed by Reuters."^[1] The article reported that Strawberry was the new internal name for what had been called Q*, and that the company had developed a "specialized way of post-training" its generative AI models, adapting base models after pretraining to perform multi-step reasoning, autonomous web browsing, and what the document referred to as "deep research."^[1]

Reuters described an internal OpenAI document, dated to May 2024, that laid out plans for using Strawberry-trained models to plan ahead, navigate the internet through "CUAs" (computer-using agents), and "perform 'deep research' on users' behalf."^[1] The wire service was careful about what it could and could not verify. It noted that it had reviewed a copy of the document but could not independently confirm the dates referenced inside it or the capabilities ascribed to the project.^[1]

The story landed two days after a separate Bloomberg report by Rachel Metz, published July 11, 2024, in which OpenAI executives told staff at an internal all-hands meeting that the company would track its progress toward AGI using a five-tier capability framework.^[8] Bloomberg also reported that, at that same meeting, OpenAI had demonstrated a research project that, executives said, gave a GPT-4-class model skills similar to human reasoning; Bloomberg could not confirm whether the demo was part of Strawberry, but the timing led most observers to assume the two were linked.^[8]

The "Levels of AGI" framework

The framework Bloomberg described slotted current chatbots into "Level 1" and placed Strawberry-style systems into "Level 2."^[8] As reported, the five levels were:

Level	Label	Description
1	Chatbots	Conversational AI capable of natural-language interaction, exemplified by then-current ChatGPT-class models^[8]
2	Reasoners	Systems able to solve problems at the level of a person with a doctorate-level education, without access to external tools^[8]
3	Agents	Systems that can take autonomous actions on a user's behalf over extended periods (days)^[8]
4	Innovators	Systems that can contribute to original inventions and discoveries^[8]
5	Organizations	Systems capable of doing the work of an entire organization^[8]

According to Bloomberg's sources, OpenAI executives told staff at the July 11, 2024 meeting that the company believed it was operating at Level 1 but was "on the cusp" of Level 2, which the company labeled "Reasoners."^[8] The framework was not published externally as a formal document; the Bloomberg article, and Axios coverage three days later, remain the canonical references for it.^[8]^[9]

From Strawberry to o1

OpenAI launched the production version of the Strawberry project on September 12, 2024, under the name o1, releasing two models simultaneously: o1-preview, aimed at general reasoning, and o1-mini, a smaller variant optimized for code and math tasks.^[2]^[10] Both were initially available to ChatGPT Plus and Team subscribers and via the OpenAI API for selected developers.^[2] OpenAI's launch blog post described the model as having "learn[ed] to refine its thinking process, try different strategies, and recognize its mistakes" through large-scale reinforcement learning.^[11]

The full "o1" (without the "-preview" suffix) shipped as part of OpenAI's "12 Days of OpenAI" event on December 5, 2024, alongside a new $200/month tier called ChatGPT Pro that gave subscribers access to a higher-compute variant labeled o1 pro mode.^[12]^[13] OpenAI published an o1 system card on the same date.^[14] Microsoft integrated o1 into its Copilot product in January 2025, and the o1-pro API was released to developers in March 2025 at $150 per million input tokens and $600 per million output tokens, the most expensive model OpenAI had offered at the time.^[10]

On December 20, 2024, the final day of the same shipping event, OpenAI announced o3 and o3-mini, the next reasoning generation, with a system card and benchmark results but no immediate public availability.^[15] o3-mini was released to all ChatGPT users (including the free tier) on January 31, 2025; the full OpenAI o3 and a successor lightweight model, o4-mini, shipped together on April 16, 2025, adding native tool use, web browsing, image input, and image generation to the reasoning pipeline.^[16]^[17] A higher-compute o3-pro variant followed on the ChatGPT Pro tier.

How Strawberry works (publicly disclosed details)

OpenAI has not released a paper or weights for o1, and the company's published material is deliberately limited. What is publicly disclosed comes from the September 12, 2024 launch posts, the December 5, 2024 system card, statements by OpenAI researchers, and reporting on the project.

Reinforcement learning on reasoning traces

OpenAI's launch blog states that o1 was "trained with reinforcement learning" to produce a private chain-of-thought before generating a final answer.^[11] OpenAI research scientist Noam Brown, who joined the company in mid-2023 and led parts of the reasoning effort, said in interviews that the team had discovered "strong test-time compute scaling laws," meaning that performance on reasoning benchmarks continued to improve as the model was allowed to think for longer at inference.^[18] OpenAI's own materials phrase the same observation more cautiously, noting "a correlation between accuracy and the logarithm of the amount of compute spent thinking before answering."^[10]

The training approach itself is closer to reinforcement learning over self-generated reasoning trajectories than to standard RLHF preference modeling. OpenAI has not described the algorithm in detail, but independent commentary has connected it to the 2022 Stanford "Self-Taught Reasoner" (STaR) line of work, in which a model is rewarded for producing chains of thought that reach verified correct answers.^[7] Reuters reported in July 2024 that internal OpenAI documents described a similar approach for Strawberry: fine-tuning base models on a curated "deep research" dataset and rewarding successful multi-step reasoning.^[1]

Test-time compute and hidden reasoning tokens

The most visible architectural choice is that o1 dedicates explicit, billable tokens to internal deliberation before producing a user-visible answer. OpenAI's API documentation describes these as "reasoning tokens," and bills for them at the same rate as output tokens even though they are not returned to the caller.^[10] The model thus has two budgets: the visible response and a hidden "thinking" budget that scales with the difficulty of the problem.

OpenAI hid the raw chain-of-thought from end users for a combination of reasons that the company described in its launch post: it wanted the model to be able to "express its thoughts in unaltered form" without policy-trained smoothing, and it wanted to preserve the option of monitoring those thoughts for signs of deception or misalignment.^[11] The company also stated that it would consider penalizing or revoking access for users who attempted to extract the hidden trace via prompt injection.^[10]

This design directly implements test-time compute (also called inference-time scaling) as a first-class product surface, rather than as a research curiosity. The o3 generation extended the idea further: o3 exposes a controllable "reasoning effort" setting (low, medium, high) that adjusts how many reasoning tokens the model uses on a given query.^[15]

Connection to earlier reasoning research

The Strawberry/o1 approach has clear intellectual precursors in academic work that predates the project by two to three years. Chain-of-thought prompting was introduced by Google researchers in early 2022 as a prompting strategy that elicited step-by-step reasoning from large pretrained models. Tree of Thoughts, introduced in 2023, generalized this to branching search over reasoning steps. Reflexion, also published in 2023, added a self-critique loop in which an agent revises its own outputs after observing failure signals. o1's contribution is not the chain-of-thought structure itself but the use of large-scale reinforcement learning to make the model produce useful chains of thought reliably, without prompting tricks.^[11]^[18]

Reception and benchmark impact

On the AIME 2024 mathematics competition (see AIME), OpenAI reported that o1 solved 83% of problems at pass@1, compared to roughly 13% for GPT-4o.^[11] On GPQA Diamond, a benchmark of graduate-level science questions designed to be Google-proof, o1 scored at or near the level of human experts in the relevant fields.^[11] These figures were widely cited at launch and were a substantial part of the case OpenAI made that o1 was a qualitative step beyond the GPT-4 generation.

The o3 generation pushed the same numbers further. On December 20, 2024, OpenAI reported that o3 reached 96.7% on AIME 2024, 87.7% on GPQA Diamond, 71.7% on SWE-bench Verified, and 25.2% on EpochAI's Frontier Math benchmark, where no other model at the time had cleared 2%.^[15] On ARC-AGI, o3 scored between 75.7% and 87.5% in its tuned configurations, comparable to the average human score on the same private set, though the high-compute run cost was reported in the hundreds of dollars per task.^[15]

OpenAI codename culture

"Strawberry" is one entry in a longer pattern of internal OpenAI codenames that have surfaced in press reports without ever being acknowledged in marketing copy.

Codename	Reported product	First public reference
Q*	Reasoning research that became Strawberry	Reuters, 2023-11-22^[4]
Strawberry	Reasoning project that became o1	Reuters, 2024-07-12^[1]
Orion	Internal name for GPT-4.5, reported as the last "pretraining-scaling" frontier model	TechCrunch, 2025-02-27^[19]

GPT-4.5, released on February 27, 2025, was reported by TechCrunch and Fortune to have been known internally as "Orion" and to be the last model OpenAI built primarily using the dense-pretraining-scale paradigm that produced GPT-3 and GPT-4.^[19]^[20] In the same coverage, two former employees told Fortune that Orion had originally been positioned as a potential GPT-5 but had not delivered the across-the-board capability jump that the company expected from a "5", and so was relabeled GPT-4.5 and superseded by reasoning-centric work.^[20] GPT-5 subsequently launched in August 2025, retiring GPT-4.5 from the consumer tiers.^[20]

The shift from "Orion" to "Strawberry" thus reads, in retrospect, as the inflection point at which the company's headline ambition moved from a single huge pretraining run to a smaller base model trained heavily on reasoning, a pattern that continued into GPT-5 Codex and the o3/o4 generations.

Significance

The Strawberry project is most often described, by both OpenAI staff and outside commentators, as the point at which the production AI ecosystem moved from pretraining-scaling to test-time-compute-scaling as the dominant performance lever.

The 2022 Chinchilla paper had reframed the scaling laws of the GPT-3 era by showing that, for a fixed compute budget, optimal pretraining required more tokens per parameter than the GPT-3-style runs had used (see Chinchilla scaling laws). That argument concerned training-time compute. Strawberry and its successors introduced a parallel axis: even with a fixed trained model, accuracy on hard reasoning tasks could be improved further by paying more compute at inference.^[3]^[18] OpenAI's own launch post pointed to "a correlation between accuracy and the logarithm of the amount of compute spent thinking before answering," and external commentators frequently described this as a new scaling law.^[11]^[3]

The connection to The Bitter Lesson, Richard Sutton's 2019 essay, is one that AI researchers including Noam Brown and Jim Fan have drawn explicitly. Sutton argued that, over the long arc of AI research, the methods that succeed are those that scale with compute, and that he saw two such methods: learning and search.^[21] Strawberry's design adds search (sampling many reasoning paths at inference, then selecting or refining among them) on top of large-scale learning, in a way that earlier post-GPT systems did not.^[18]^[21]

For the broader industry, the visible effect was that competing labs accelerated their own reasoning programs. By mid-2025, every major frontier lab had shipped or announced a "thinking" mode that explicitly allocated extra inference compute, and the design pattern of charging users for hidden reasoning tokens became standard. The Strawberry rebrand also coincided with renewed willingness, inside and outside OpenAI, to talk publicly about the company's AGI roadmap, with the Bloomberg "five levels" framework providing a vocabulary that quickly entered industry discussion.^[8]^[9]

Limitations and caveats

The public record about Strawberry rests almost entirely on two pieces of journalism (Reuters in November 2023 and July 2024, Bloomberg in July 2024) and the production materials that followed.^[4]^[1]^[8] OpenAI has not published the internal documents, has not released a research paper describing the o1 training pipeline, and has explicitly hidden o1's chain of thought from users and developers.^[10] The connection between the November 2023 Q* leak and the July 2024 Strawberry leak is asserted by Reuters and by subsequent secondary coverage, but the precise relationship between the two research efforts has not been documented by OpenAI itself.^[1]^[4]

Several details that have circulated in popular reporting are not directly supported by primary sources. OpenAI has not, for example, published a confirmed reason for the choice of the name "Strawberry"; the connection to the "how many r's are in 'strawberry'" tokenization joke is reported and widely repeated, but it is presented in secondary sources as folklore rather than as an officially confirmed naming rationale.^[6]^[7]

It is also worth noting that the "five levels of AGI" framework was reported by Bloomberg and Axios as an internal taxonomy shared at a company meeting, not as an externally published OpenAI document.^[8]^[9] Other versions of the same idea, with different labels and a different number of stages, have appeared in OpenAI communications since, so the table above should be read as a snapshot of how the framework was reported in July 2024 rather than as a permanent classification.

Q* OpenAI: The earlier codename for the same line of work, surfaced in November 2023.
OpenAI o1: The first production model to ship from the Strawberry project.
OpenAI o-series: The model family that grew out of Strawberry (o1, o3, o3-pro, o4-mini, and successors).
Chain-of-Thought: The prompting technique that o1 internalizes as a trained behavior.
Test-time compute and Inference-time scaling: The general paradigm that the Strawberry project popularized.
The Bitter Lesson: Sutton's essay arguing that scalable learning and search dominate over hand-crafted methods.
Chinchilla scaling laws and Scaling Laws: The pretraining-era scaling regime that Strawberry partly displaced.
Deep Research: The product surface that grew out of the internal "deep research" dataset described in the Reuters report.

References

Anna Tong and Katie Paul, "Exclusive: OpenAI working on new reasoning technology under code name 'Strawberry'", Reuters, 2024-07-12. https://www.reuters.com/technology/artificial-intelligence/openai-working-new-reasoning-technology-under-code-name-strawberry-2024-07-12/. Accessed 2026-05-20. ↩
OpenAI, "Introducing OpenAI o1-preview", OpenAI, 2024-09-12. https://openai.com/index/introducing-openai-o1-preview/. Accessed 2026-05-20. ↩
Nathan Lambert, "OpenAI's o3: the 2024 finale of AI", Interconnects, 2024-12-20. https://www.interconnects.ai/p/openais-o3-the-2024-finale-of-ai. Accessed 2026-05-20. ↩
Anna Tong, Jeffrey Dastin and Krystal Hu, "OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say", CNBC/Reuters, 2023-11-22. https://www.cnbc.com/2023/11/22/sam-altmans-ouster-at-openai-precipitated-by-letter-to-board-about-ai-breakthrough-sources-tell-reuters.html. Accessed 2026-05-20. ↩
Wikipedia contributors, "Removal of Sam Altman from OpenAI", Wikipedia, 2025-12-01. https://en.wikipedia.org/wiki/Removal_of_Sam_Altman_from_OpenAI. Accessed 2026-05-20. ↩
M.G. Siegler, "'Strawberry' Fields of Research", Spyglass, 2024-07-15. https://spyglass.org/openai-strawberry-fields-forever/. Accessed 2026-05-20. ↩
Singularity Hub staff, "OpenAI's Project Strawberry Said to Be Building AI That Reasons and Does 'Deep Research'", Singularity Hub, 2024-07-19. https://singularityhub.com/2024/07/19/openais-project-strawberry-is-said-to-be-building-ai-that-reasons-and-does-deep-research/. Accessed 2026-05-20. ↩
Rachel Metz, "OpenAI Sets Levels to Track Progress Toward Superintelligent AI", Bloomberg, 2024-07-11. https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai. Accessed 2026-05-20. ↩
Ina Fried, "OpenAI releases levels of artificial general intelligence (AGI)", Axios, 2024-07-15. https://www.axios.com/2024/07/15/openai-chatgpt-reasoning-ai-levels. Accessed 2026-05-20. ↩
Wikipedia contributors, "OpenAI o1", Wikipedia, 2026-04-15. https://en.wikipedia.org/wiki/OpenAI_o1. Accessed 2026-05-20. ↩
OpenAI, "Learning to reason with LLMs", OpenAI, 2024-09-12. https://openai.com/index/learning-to-reason-with-llms/. Accessed 2026-05-20. ↩
Carl Franzen, "OpenAI launches full o1 model with image uploads and analysis, debuts ChatGPT Pro", VentureBeat, 2024-12-05. https://venturebeat.com/ai/openai-launches-full-o1-model-with-34-reduced-error-rate-debuts-chatgpt-pro/. Accessed 2026-05-20. ↩
OpenAI, "Introducing ChatGPT Pro", OpenAI, 2024-12-05. https://openai.com/index/introducing-chatgpt-pro/. Accessed 2026-05-20. ↩
OpenAI, "OpenAI o1 System Card", OpenAI, 2024-12-05. https://cdn.openai.com/o1-system-card-20241205.pdf. Accessed 2026-05-20. ↩
Kyle Wiggers, "OpenAI announces new o3 models", TechCrunch, 2024-12-20. https://techcrunch.com/2024/12/20/openai-announces-new-o3-model/. Accessed 2026-05-20. ↩
OpenAI, "OpenAI o3-mini", OpenAI, 2025-01-31. https://openai.com/index/openai-o3-mini/. Accessed 2026-05-20. ↩
OpenAI, "Introducing OpenAI o3 and o4-mini", OpenAI, 2025-04-16. https://openai.com/index/introducing-o3-and-o4-mini/. Accessed 2026-05-20. ↩
Sonya Huang and Pat Grady, "Training Data: Noam Brown and Team on Teaching LLMs to Reason", Sequoia Capital, 2024-10-01. https://www.sequoiacap.com/podcast/training-data-noam-brown/. Accessed 2026-05-20. ↩
Maxwell Zeff, "OpenAI unveils GPT-4.5 'Orion,' its largest AI model yet", TechCrunch, 2025-02-27. https://techcrunch.com/2025/02/27/openai-unveils-gpt-4-5-orion-its-largest-ai-model-yet/. Accessed 2026-05-20. ↩
Jeremy Kahn, "OpenAI launches long-awaited GPT-4.5, but 'Orion's' performance disappoints some", Fortune, 2025-02-27. https://fortune.com/2025/02/27/openai-gpt-4-5-orion-launch-sam-altman-benchmarks/. Accessed 2026-05-20. ↩
Wikipedia contributors, "Bitter lesson", Wikipedia, 2026-02-10. https://en.wikipedia.org/wiki/Bitter_lesson. Accessed 2026-05-20. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributor · full history

Suggest edit

What links here

OpenAI o1 Q* OpenAI

Infobox

Background

From Q* to Strawberry

The July 12, 2024 Reuters scoop

The "Levels of AGI" framework

From Strawberry to o1

How Strawberry works (publicly disclosed details)

Reinforcement learning on reasoning traces

Test-time compute and hidden reasoning tokens

Connection to earlier reasoning research

Reception and benchmark impact

OpenAI codename culture

Significance

Limitations and caveats

Related work

See also

References

Improve this article

Related Articles

Beverage ChatGPT Plugins

Data Visualization ChatGPT Plugins

Internet ChatGPT Plugins

OpenAI o1

OpenAI o3

OpenAI o-series

What links here

Related Articles

Beverage ChatGPT Plugins

Data Visualization ChatGPT Plugins

Internet ChatGPT Plugins

OpenAI o1

OpenAI o3

OpenAI o-series

What links here