Jerry Tworek
Last reviewed
Jun 7, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,519 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 7, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,519 words
Add missing citations, update stale details, or suggest a clearer explanation.
Jaroslaw Tworek, known professionally as Jerry Tworek, is a Polish artificial intelligence researcher best known for leading the development of OpenAI's reasoning models, the o1 and o3 series, and its Codex code-generation systems. Often described in the technology press as the "father of reasoning models," he spent roughly seven years at OpenAI, rising to vice president of research before leaving in early 2026 to found his own laboratory, Core Automation. His work on inference-time reasoning and program synthesis helped define a major shift in how large language models are trained and used.
Tworek was born and raised in Poland. He studied at the University of Warsaw, where he followed an individual, custom-designed curriculum and earned a master's degree in mathematics with a focus on mathematical applications in finance.[3][4] During his university years he interned at Google in Silicon Valley.[3] He generally uses the anglicized nickname "Jerry" rather than his given name, Jaroslaw.[4]
Before entering AI research, Tworek spent about five years working in quantitative finance. He developed algorithmic strategies for trading futures contracts on global markets, applying optimization and signal-extraction techniques, and worked at a hedge fund where he eventually became head of its research and development department.[1][3] He has spoken critically of that period, describing a culture driven by negative feedback that he found difficult.[3] Tworek has said he decided to move into machine learning after seeing DeepMind's 2015 work using reinforcement learning to play Atari games, recalling that he "realized there was a spark of intelligence in that."[3] He applied to OpenAI through the company's public website and joined in 2019.[3]
Tworek joined OpenAI in 2019, four years after the company was founded. (Some news outlets that covered his 2026 departure incorrectly described him as a co-founder; he was not part of the founding team.) He was promoted to vice president of research in 2022 and remained one of the company's most influential technical leaders until his exit in 2026.[2]
He began at OpenAI as a research scientist working on reinforcement learning and robotics.[3] He was one of the authors of "Solving Rubik's Cube with a Robot Hand" (2019), in which OpenAI trained neural networks to control a human-like robotic hand using reinforcement learning and a method called Automatic Domain Randomization.[5][9] The project ran on Rapid, the same distributed training infrastructure OpenAI had built for OpenAI Five, its Dota 2 game-playing system, and the model architecture drew on that earlier work.[9] Tworek presented related research at the NeurIPS 2019 deep reinforcement learning workshop.[4]
Tworek went on to take a lead research role in OpenAI's program-synthesis efforts, the work of teaching language models to write computer code.[4] He was a co-author of "Evaluating Large Language Models Trained on Code" (2021), the paper that introduced Codex, the model that powered GitHub Copilot.[2][5] He also contributed to related training methods, including "Efficient Training of Language Models to Fill in the Middle" (2022) and "Text and Code Embeddings by Contrastive Pre-Training" (2022), as well as "Training Verifiers to Solve Math Word Problems" (2021), the paper that introduced the widely used GSM8K math benchmark.[5]
Tworek was among the contributors to GPT-4, listed as an author on the 2023 GPT-4 Technical Report, and worked on the model's post-training.[2][5] By the mid-2020s his publications had accumulated more than 50,000 citations, according to his Google Scholar profile, reflecting how central his work had become to modern language-model research.[5]
Tworek is most closely associated with OpenAI's reasoning models. He led the effort that produced o1, released in 2024, the first OpenAI model designed to "think" before answering by spending extra computation at inference time, an approach commonly called inference-time scaling.[1][2] Instead of relying only on larger pretraining runs, o1 used reinforcement learning to train the model to generate long internal chains of reasoning, and it set records on demanding reasoning benchmarks soon after release.[1] He carried this line of work forward into the successor o3 series, which reached graduate-level performance on science and mathematics problems.[1][2] The technology press nicknamed Tworek the "father of reasoning models" for this contribution, sometimes pairing it with a similar label for his role in code generation.[1]
He remained central as the reasoning approach was folded into OpenAI's flagship products, including the 2025 release of GPT-5 and the rollout of ChatGPT agents, and he became a frequent public explainer of how the models reason.[1] In a 2025 interview he detailed how GPT-5 deliberates internally, framing the reasoning models as a closer analogue to the way humans pause and think before responding.[1]
In early January 2026, after roughly seven years at the company, Tworek announced on the social platform X that he was leaving OpenAI.[1][2] He wrote that he was "leaving to try and explore types of research that are hard to do at OpenAI," called the decision emotionally difficult, and described OpenAI as a company with a special place in human history.[2][10] Commentators connected his exit to OpenAI's growing emphasis on commercialization and product metrics, and noted that it followed other high-profile departures from the company's original technical leadership.[10]
In interviews after leaving, Tworek offered a candid assessment of the field. He said that scaling reinforcement learning had proved less transformative than he once expected, and argued that reaching artificial general intelligence would also require new neural architectures beyond the transformer and a shift toward continual learning, since people learn without a separate training phase.[7] He estimated that AGI could arrive "by 2029," and praised rivals' progress, noting that "what Anthropic has achieved with coding models and coding agents is remarkable."[7] He also suggested that Google had narrowed OpenAI's lead largely because OpenAI had stumbled and moved too slowly despite its head start.[7]
Within weeks of leaving OpenAI, Tworek launched a new AI laboratory, Core Automation, with the stated goal of building "the most automated AI lab in the world," beginning by automating its own research operations.[6] Rather than training ever-larger models on more data, the lab aims to develop new learning algorithms that extend beyond pretraining and reinforcement learning, together with architectures designed to scale better than transformers.[6] Reporting indicated that its flagship project, codenamed Ceres, targets a roughly hundredfold reduction in training data through continual learning, and that the startup was seeking to raise between 500 million and 1 billion dollars only weeks after incorporation.[8] Core Automation moved quickly to recruit researchers from rivals including Anthropic and Google DeepMind, and analysts grouped it with other labs founded by OpenAI alumni, such as Thinking Machines Lab and Safe Superintelligence, that share the conviction that further progress requires fundamentally different methods.[6][8]
The following table lists representative papers Tworek co-authored at OpenAI, with the contribution each is best known for.[5]
| Year | Publication | Best known for |
|---|---|---|
| 2019 | Solving Rubik's Cube with a Robot Hand | Robotic dexterity via reinforcement learning and Automatic Domain Randomization |
| 2021 | Evaluating Large Language Models Trained on Code | Introduced Codex, the basis for GitHub Copilot |
| 2021 | Training Verifiers to Solve Math Word Problems | Introduced the GSM8K math benchmark |
| 2022 | Efficient Training of Language Models to Fill in the Middle | Fill-in-the-middle training for code models |
| 2022 | Text and Code Embeddings by Contrastive Pre-Training | OpenAI text and code embedding models |
| 2023 | GPT-4 Technical Report | Contributing author |
Tworek's career illustrates a broader transition in artificial intelligence during the early 2020s, from pure scaling of pretraining toward reinforcement learning, code generation, and explicit reasoning at inference time. The reasoning paradigm he helped establish with o1 and o3 reshaped how leading labs build and evaluate frontier models, and his move to found Core Automation placed him among a group of senior researchers betting that the next breakthroughs will come from new learning algorithms rather than larger models alone.[1][6]