Q* OpenAI

Q* (pronounced Q-Star) is the reported, never-officially-detailed OpenAI research project that surfaced in news reports during the November 2023 leadership crisis, when Reuters reported that several staff researchers had warned the board about a powerful AI advance. According to that reporting, Q* had, given vast computing resources, solved certain grade-school-level math problems, a result some inside the company viewed as a possible step toward artificial general intelligence (AGI). OpenAI has never published a product or paper named Q*, so the specific math-solving and architecture claims should be treated as reported or rumored rather than confirmed. By mid-2024 reporters tied the same internal research thread to the codename Strawberry, and on September 12, 2024 OpenAI publicly launched its first reasoning model line as o1-preview, which Reuters, Bloomberg and The Information (citing internal sources) described as the production descendant of that research.^[1]^[2]^[3]

The story of Q* matters less because of any single technical artifact and more because it marks the moment when reinforcement-learning-based reasoning, inference-time search, and process supervision became the dominant frontier in large language model research. The o1, o3, and o4-mini families that followed all share the same broad design philosophy that the Q* leak hinted at: spend more compute at inference time on a hidden chain of thought, train the model with reinforcement learning on verifiable problems, and use a step-level reward signal rather than a single end-of-answer score. OpenAI has confirmed this design pattern for the o-series, but has not confirmed it for Q* by name.^[4]^[5]

Quick facts

Item	Detail
Codename	Q* (pronounced Q-Star), reportedly succeeded by Strawberry
Organization	OpenAI
First public reporting	Reuters, November 22, 2023
Trigger event	Reported letter from staff researchers to the board, days before Sam Altman's November 17, 2023 firing
Reported capability	Solving certain grade-school level math problems with novel reasoning (reported, unverified)
Speculated technique	Hybrid of Q-learning, A* search, process reward models, and tree-style inference search (speculation, not confirmed)
Reported successor	o1-preview (Sep 12, 2024); full o1 (Dec 5, 2024); o3 (Apr 16, 2025)
Status	Never confirmed by OpenAI as a standalone product; widely reported as the research lineage behind the o-series

What is OpenAI Q*?

Q* is best understood not as a confirmed model but as a reported internal research effort whose name became public during OpenAI's November 2023 turmoil. The Reuters report described it as a project that some staff believed could be a breakthrough in the search for superintelligence, while cautioning that the news agency could not independently verify the capabilities the researchers claimed.^[1]^[2] OpenAI itself has never released documentation, benchmarks, or a paper under the name Q*, which is why the math-solving result and any description of its architecture remain in the category of reported or speculated rather than established fact.

What is documented is the lineage framing that grew up around the name. Reporters at Reuters, Bloomberg and The Information later connected the same internal reasoning research to the Strawberry codename and then to the publicly released o1 reasoning model, and English Wikipedia summarizes the consensus reading as: o1 "was formerly known within OpenAI as Q*, and later as Strawberry."^[3] OpenAI has neither confirmed nor denied that specific chain of codenames.

What was reported about Q* in November 2023?

On November 22, 2023, five days after Sam Altman was abruptly removed by the OpenAI board, Reuters published an exclusive citing two unnamed people familiar with the matter. They reported that several staff researchers had sent the board a letter warning of a powerful AI discovery that they believed could threaten humanity, and that the letter, alongside concerns about the pace of commercialization, was one of the catalysts for the board's decision to remove Altman. The letter referenced an internal model named Q*.^[1] Fortune, summarizing the same Reuters reporting, wrote that "several staff researchers wrote a letter to the organization's board warning of a discovery that could potentially threaten the human race."^[6]

The Reuters story said that, given vast computing resources, Q* had been able to solve certain math problems. The performance level was modest in absolute terms, with Reuters describing the system as "performing math on the level of grade-school students," but the source said researchers were excited because the system appeared to be reasoning through problems rather than recalling answers, and because performance was scaling with compute.^[1]^[2] Crucially, Reuters stated that it could not independently verify the capabilities of Q* claimed by the researchers, a caveat that is often dropped when the story is retold.^[1]

Reuters also reported that, after it contacted OpenAI, chief technology officer Mira Murati acknowledged in an internal memo to employees the existence of the Q* project as well as the letter sent to the board, while not confirming the accuracy of the media reports about its capabilities.^[6] In a subsequent interview with The Verge, Altman addressed the topic with deliberate vagueness, calling the leak "unfortunate" and saying he had no particular comment, while reiterating that OpenAI's research progress had been consistently rapid.^[7]

Reported by Reuters / Fortune (sourced but unverified)	Speculation by outside commentators (not reported as fact)
A letter from staff researchers existed	The letter explicitly named the project as a path to AGI
Q* solved certain grade-school math problems	The math involved Olympiad-level proofs
Mira Murati acknowledged the project internally	OpenAI confirmed safety risks publicly
Capability reportedly scaled with compute	The model could rewrite its own code or self-improve

Why is it called Q*?

The name itself is widely read as a hint, though OpenAI never explained it. The asterisk in Q* evokes A*, the classic heuristic search algorithm introduced by Peter Hart, Nils Nilsson and Bertram Raphael in 1968 at SRI. A* searches a graph of possible states by combining the actual cost so far with a heuristic estimate of the remaining cost, and it is one of the foundational algorithms in any artificial intelligence curriculum.^[8]

The Q comes from Q-learning, a value-based reinforcement learning method introduced by Christopher Watkins in his 1989 Cambridge PhD thesis. Q-learning estimates a function Q(state, action) that captures the expected long-term reward of taking a particular action in a particular state. It is the same family of techniques that produced DeepMind's Deep Q-Network (DQN), which learned to play Atari games at superhuman level in 2013, and that contributed to the value head used in AlphaGo.^[9]

Q* in classical reinforcement learning refers specifically to the optimal action-value function: the Q-function under the optimal policy. Combining that name with A*-style search produces a fairly obvious hint at the architecture researchers suspected: a learned value function that scores partial reasoning steps, paired with a search procedure that explores a tree of possible next steps, all wrapped around a large language model generator. This reading is inference, not an OpenAI statement.

What public research pointed at the same idea?

Three threads of academic work made the speculation about Q* feel grounded rather than fanciful.

Let's Verify Step by Step (Lightman et al., 2023)

In May 2023, OpenAI researchers Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever and Karl Cobbe published "Let's Verify Step by Step" on arXiv (paper id 2305.20050). The paper compared two ways to train reward models on the MATH benchmark: outcome supervision, which only rewards the final answer, and process supervision, which rewards each individual reasoning step. Process supervision substantially outperformed outcome supervision, with the best process reward model (PRM) solving 78% of problems in a representative subset of MATH. The team also released PRM800K, a dataset of 800,000 step-level human correctness labels.^[10]

The paper is the closest publicly available analogue of what Q* was rumored to do. It establishes that step-level reward signals, applied across long chains of reasoning, are a tractable way to train language models to do mathematics.

Self-Taught Reasoner (STaR)

In March 2022, Eric Zelikman, Yuhuai Wu, Jesse Mu and Noah Goodman of Stanford published "STaR: Bootstrapping Reasoning with Reasoning" (arXiv 2203.14465). STaR uses a small number of seed examples with worked-out reasoning to generate rationales for a much larger set of problems, fine-tunes the model on the rationales that produced correct answers, and iterates. A key innovation, called rationalization, lets the model generate post-hoc rationales for problems it initially got wrong, given the correct answer. STaR achieved performance close to a 30 times larger model on CommonsenseQA and showed that a self-improving loop on reasoning was practically achievable.^[11]

Tree of Thoughts

Also in May 2023, Princeton and Google DeepMind researchers published "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" (Yao et al., arXiv 2305.10601). Tree of Thoughts (ToT) lets a model expand multiple reasoning branches, evaluate intermediate states, and backtrack, in the spirit of classic AI search. On the Game of 24 task, ToT raised GPT-4 success from around 4% with chain-of-thought prompting to 74%.^[12]

Taken together, these three lines of work describe almost exactly the recipe that outside observers guessed Q* was following: generate candidate reasoning steps, score each step with a learned verifier, search over branches, and use the resulting traces to fine-tune the generator with reinforcement learning. None of this was confirmed to be Q*, but it made the speculation technically plausible.

What did Yann LeCun and Noam Brown point to?

Meta's chief AI scientist Yann LeCun was one of the few senior researchers to publicly comment on Q* during the November 2023 storm. Writing on X (formerly Twitter), he urged people to ignore the wave of speculation and noted that essentially every top lab was working on combining language models with planning. He singled out OpenAI's hiring of Noam Brown as the most concrete evidence of where the work was heading.^[13]

Noam Brown joined OpenAI in mid-2023 after years at Carnegie Mellon and Meta's FAIR lab, where he co-built Libratus and Pluribus (which beat top professional poker players) and Cicero (which played Diplomacy at human level by combining a language model with strategic search). On joining OpenAI on July 6, 2023, Brown wrote that his work on reasoning and self-play in poker and Diplomacy had motivated him to "make these methods truly general," adding that if the research succeeded "we may one day see LLMs that are 1,000x better than GPT-4." The combination of his hire and the Lightman process supervision paper made it reasonable to assume OpenAI was working on something resembling AlphaGo-style search bolted onto a language model, though Brown was describing his research goals, not Q* specifically.^[13]^[14]

How does Q* relate to o1 and Strawberry?

Q* never shipped as a product under that name. The next time the same internal research thread surfaced in reporting was in mid-2024, under a different codename.

In July 2024, Reuters and Bloomberg reported that OpenAI was internally testing a model under the codename Strawberry. According to the Reuters report, an internal document described Strawberry as a project aimed at letting OpenAI's models plan ahead, navigate the internet autonomously, and perform what the company called "deep research." The Information added that Strawberry was a successor to Q* and was being trained with reinforcement learning to follow long chains of reasoning.^[3]^[15]

On September 12, 2024, OpenAI publicly released o1-preview and o1-mini to ChatGPT Plus and Team users. The accompanying blog post described a model trained with reinforcement learning to perform a long internal chain of thought before answering, with performance scaling logarithmically with the amount of inference-time compute. The full o1 model followed on December 5, 2024, alongside the launch of ChatGPT Pro at $200 per month.^[4] OpenAI did not use the name Q* in any of this documentation; the Q*-to-Strawberry-to-o1 chain is a media reconstruction built from internal sources, not an official OpenAI statement.

Model	Public release	Notable detail
o1-preview	September 12, 2024	First public OpenAI reasoning model; 83% on AIME 2024 vs 13% for GPT-4o
o1-mini	September 12, 2024	About 80% cheaper than o1-preview; tuned for STEM and code
o1 (full)	December 5, 2024	Released with ChatGPT Pro; image input
o1-pro	March 2025 (API)	$150 / $600 per million input/output tokens
o3-mini	January 31, 2025	Three reasoning effort levels
o3	April 16, 2025	Roughly 3x o1 accuracy on ARC-AGI; 71.7% on SWE-bench Verified
o4-mini	April 16, 2025	Multimodal reasoning, tool use, image generation
o3-pro	June 10, 2025	Extended deliberation tier for o3

The shared design pattern across this family, sometimes called inference-time scaling or test-time compute, is the public-facing answer to the question of what Q* really pointed at: a research direction, not a single confirmed model.^[5]

What is the speculative theory of how Q* worked?

When the leak hit, the AI research community produced a flurry of blog posts trying to reconstruct what Q* might be doing under the hood. None of these were confirmed by OpenAI, but they share a remarkably consistent picture and they line up well with what OpenAI later described publicly for o1.

At the heart of the speculation is the idea that a language model, instead of producing one linear answer, generates a tree of possible reasoning paths. Each branch is expanded into one or more next steps, and partial branches can be pruned or extended depending on how promising they look. A separately trained process reward model gives a score to each step rather than to the entire answer, exactly as Lightman et al. demonstrated on MATH in 2023. A PRM can be used in three closely related ways: as a verifier for best-of-N sampling, as a guide for tree search, and as a reward signal for reinforcement learning fine-tuning of the generator.

A tree search guided by a PRM produces a large number of reasoning traces labeled with step-level scores. Those traces can be filtered down to high-quality solutions and used to fine-tune the generator with offline reinforcement learning, similar in spirit to STaR but with a much richer reward signal. This loop, where the model produces its own training data and a verifier curates it, is sometimes called self-play for reasoning, and it is the closest thing in modern machine learning to what AlphaGo did for the game of Go.

Many commentators, including Tim Lee at Understanding AI and Nathan Lambert in his Interconnects newsletter, framed Q* as an attempt to bring an AlphaGo-style architecture to a language model: a generator policy that proposes moves (reasoning steps), a value network that scores positions (the PRM), and a search procedure (something like Monte Carlo tree search or beam search) that combines the two. To be explicit: this is informed speculation by outside observers, not a description OpenAI has endorsed.^[16]

What has OpenAI confirmed about Q*?

OpenAI has never publicly described a product or paper named "Q*." The company has, however, said quite a lot about the o-series that is consistent with the rumored Q* architecture, and almost nothing that contradicts it.^[4]^[5]

Claim	Confirmed by OpenAI
A model called Q* exists	Indirectly, via Mira Murati's internal note acknowledging media reports
Q* solved grade-school math problems	No public confirmation; Reuters could not independently verify it
The o-series uses RL over a hidden chain of thought	Yes; stated in the o1 release post and system card
Inference-time compute scaling is a deliberate goal	Yes; o1 documentation describes log-linear accuracy scaling with thinking time
Process reward models or tree search are used	Not confirmed by name; OpenAI keeps the specific algorithm proprietary
Q* is the same project as Strawberry / o1	Not confirmed officially; reported by Reuters, Bloomberg and The Information using internal sources

How did Q* connect to the Altman firing and the safety debate?

The Q* leak landed in the middle of one of the most chaotic weeks in OpenAI's history. Sam Altman was removed by the board on November 17, 2023, with the board citing a loss of confidence in his leadership. Within hours, Greg Brockman resigned in protest, hundreds of OpenAI employees signed a letter threatening to leave for Microsoft unless Altman was reinstated, and on November 21, 2023, Altman returned with a new initial board. The Reuters story on Q* appeared the following day.^[1]^[6]

The framing of the staff researchers' letter, that the new capability could threaten humanity, became one of the public anchors for the safety side of OpenAI's internal debate. Whether Q*'s actual capabilities deserved that framing is contested. Gary Marcus, in a Substack post titled "About that OpenAI 'breakthrough,'" argued that the entire episode had been overhyped and that solving grade-school math problems is a long way from any plausible AGI threshold. MIT Technology Review's Will Douglas Heaven made a similar point, quoting Wenda Li of Edinburgh and Katie Collins of Cambridge to argue that current systems still lack the architecture needed to reason about mathematics in any robust way.^[17]^[18]

Why does Q* still matter?

In hindsight, Q* is best read as the public's first glimpse of a research bet that paid off. The reported lineage runs from process supervision in May 2023, through the November 2023 leak, through the Strawberry codename in mid-2024, into o1 in September 2024 and o3 in April 2025. Every model in that public chain has the same basic shape: a language model that has been trained with reinforcement learning to think for a long time before answering, scored by a verifier on each step, and rewarded for getting the final answer right.^[4]^[5]^[10]

The practical impact has been concrete. The o-series substantially raised the bar on math benchmarks (o1-preview hit 83% on AIME 2024 vs 13% for GPT-4o), on competitive programming (o3 reached an Elo of 2727 on Codeforces vs 1891 for o1), and on agentic software tasks (o3 scored 71.7% on SWE-bench Verified vs 48.9% for o1). The same techniques have spread industry-wide, with Anthropic, Google DeepMind, DeepSeek, Alibaba and others all releasing reasoning-tuned variants of their flagship models within a year of o1.^[5]

Whether or not Q* ever existed as a discrete model, the name has become a kind of shorthand for the moment when the AI field stopped trying to scale only the size of transformer models and started seriously scaling how long they think.

ELI5: what is Q* in simple terms?

Imagine a very smart student who, instead of blurting out the first answer that comes to mind, quietly works through a problem step by step on scratch paper, tries a few different approaches, checks each step, and only then writes down the final answer. Q* was the rumored OpenAI project about teaching a chatbot to do exactly that, using a separate "grader" that scores each step of the thinking and a search that tries many lines of reasoning. People got excited (and a little scared) in November 2023 because reporters said it could solve simple math problems on its own and might be a step toward much smarter AI. OpenAI never explained Q* directly, but the same idea later became the real, public reasoning models o1 and o3.

Comments

References

Tong, Anna; Dastin, Jeffrey; Hu, Krystal. "Exclusive: OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say." Reuters, November 22, 2023. Republished by CNBC at https://www.cnbc.com/2023/11/22/sam-altmans-ouster-at-openai-precipitated-by-letter-to-board-about-ai-breakthrough-sources-tell-reuters.html
Yahoo Finance / Reuters. "Exclusive-OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say." November 22, 2023. https://finance.yahoo.com/news/exclusive-sam-altmans-ouster-openai-224405716.html
Tong, Anna; Hu, Krystal. "Exclusive: OpenAI working on new reasoning technology under code name 'Strawberry'." Reuters, July 12, 2024.
OpenAI. "Introducing OpenAI o1-preview." September 12, 2024. https://openai.com/index/introducing-openai-o1-preview/
OpenAI. "Introducing OpenAI o3 and o4-mini." April 16, 2025. https://openai.com/index/introducing-o3-and-o4-mini/
Roth, Emma; Field, Hayden. "OpenAI staff reportedly warned the board about an AI breakthrough that could threaten humanity before Sam Altman was ousted." Fortune / Reuters, November 23, 2023. https://fortune.com/2023/11/23/openai-sam-altman-board-directors-agi-breakthrough-reuters/
Patel, Nilay. "Sam Altman on OpenAI and AI: 'I have no particular comment on that unfortunate leak'." The Verge, November 2023.
Hart, Peter E.; Nilsson, Nils J.; Raphael, Bertram. "A Formal Basis for the Heuristic Determination of Minimum Cost Paths." IEEE Transactions on Systems Science and Cybernetics, 1968.
Watkins, Christopher J. C. H. "Learning from Delayed Rewards." PhD thesis, King's College, Cambridge, 1989; Mnih et al. "Playing Atari with Deep Reinforcement Learning." arXiv:1312.5602, 2013.
Lightman, Hunter et al. "Let's Verify Step by Step." arXiv:2305.20050, May 2023. https://arxiv.org/abs/2305.20050
Zelikman, Eric; Wu, Yuhuai; Mu, Jesse; Goodman, Noah D. "STaR: Bootstrapping Reasoning with Reasoning." arXiv:2203.14465, March 2022. https://arxiv.org/abs/2203.14465
Yao, Shunyu et al. "Tree of Thoughts: Deliberate Problem Solving with Large Language Models." arXiv:2305.10601, May 2023.
LeCun, Yann. Public posts on X (formerly Twitter) regarding Q*, November 2023.
Brown, Noam. "When I joined OpenAI, I wrote about how my experience researching reasoning in AI for poker and Diplomacy, and seeing the difference thinking made, motivated me to help bring the paradigm to LLMs." X (formerly Twitter), July 6, 2023 and September 12, 2024. https://x.com/polynoamial/status/1834281212787236945
Bloomberg / The Information. "OpenAI's 'Strawberry' project aims at human-like reasoning." July 2024.
Lee, Timothy B. "The real research behind the wild rumors about OpenAI's Q* project." Understanding AI, December 2023. https://www.understandingai.org/p/how-to-think-about-the-openai-q-rumors
Marcus, Gary. "About that OpenAI 'breakthrough'." Substack, November 2023. https://garymarcus.substack.com/p/about-that-openai-breakthrough
Heaven, Will Douglas. "Unpacking the hype around OpenAI's rumored new Q* model." MIT Technology Review, November 27, 2023. https://www.technologyreview.com/2023/11/27/1083886/unpacking-the-hype-around-openais-rumored-new-q-model/

Q* OpenAI

Quick facts

What is OpenAI Q*?

What was reported about Q* in November 2023?

Why is it called Q*?

What public research pointed at the same idea?

Let's Verify Step by Step (Lightman et al., 2023)

Self-Taught Reasoner (STaR)

Tree of Thoughts

What did Yann LeCun and Noam Brown point to?

How does Q* relate to o1 and Strawberry?

What is the speculative theory of how Q* worked?

What has OpenAI confirmed about Q*?

How did Q* connect to the Altman firing and the safety debate?

Why does Q* still matter?

ELI5: what is Q* in simple terms?

See also

Comments

References

Improve this article

What links here

Quick facts

What is OpenAI Q*?

What was reported about Q* in November 2023?

Why is it called Q*?

What public research pointed at the same idea?

Let's Verify Step by Step (Lightman et al., 2023)

Self-Taught Reasoner (STaR)

Tree of Thoughts

What did Yann LeCun and Noam Brown point to?

How does Q* relate to o1 and Strawberry?

What is the speculative theory of how Q* worked?

What has OpenAI confirmed about Q*?

How did Q* connect to the Altman firing and the safety debate?

Why does Q* still matter?

ELI5: what is Q* in simple terms?

See also

Comments

References

What links here

Quick facts

What is OpenAI Q*?

What was reported about Q* in November 2023?

Why is it called Q*?

What public research pointed at the same idea?

Let's Verify Step by Step (Lightman et al., 2023)

Self-Taught Reasoner (STaR)

Tree of Thoughts

What did Yann LeCun and Noam Brown point to?

How does Q* relate to o1 and Strawberry?

What is the speculative theory of how Q* worked?

What has OpenAI confirmed about Q*?

How did Q* connect to the Altman firing and the safety debate?

Why does Q* still matter?

ELI5: what is Q* in simple terms?

See also

Comments

References

Improve this article

Related Articles

ChatGPT

ChatGPT Guides

ChatGPT plugins

GPT Store

Gaming ChatGPT Plugins

Generative pre-trained transformer

What links here

Quick facts

What is OpenAI Q*?

What was reported about Q* in November 2023?

Why is it called Q*?

What public research pointed at the same idea?

Let's Verify Step by Step (Lightman et al., 2023)

Self-Taught Reasoner (STaR)

Tree of Thoughts

What did Yann LeCun and Noam Brown point to?

How does Q* relate to o1 and Strawberry?

What is the speculative theory of how Q* worked?

What has OpenAI confirmed about Q*?

How did Q* connect to the Altman firing and the safety debate?

Why does Q* still matter?

ELI5: what is Q* in simple terms?

See also

Comments

References

Related Articles

ChatGPT

ChatGPT Guides

ChatGPT plugins

GPT Store

Gaming ChatGPT Plugins

Generative pre-trained transformer

What links here