Łukasz Kaiser
Last reviewed
Jun 5, 2026
Sources
25 citations
Review status
Source-backed
Revision
v2 · 2,041 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 5, 2026
Sources
25 citations
Review status
Source-backed
Revision
v2 · 2,041 words
Add missing citations, update stale details, or suggest a clearer explanation.
Łukasz Kaiser is a Polish computer scientist known for his work on deep learning and natural language processing. He is one of the eight co-authors of the 2017 paper "Attention Is All You Need," which introduced the Transformer architecture, and a co-author of the TensorFlow machine learning system. Trained originally as a logician, Kaiser held an academic research post at France's Centre National de la Recherche Scientifique (CNRS) before spending nearly eight years at Google Brain. Since 2021 he has been a senior researcher at OpenAI, where he was a foundational contributor to the company's first reasoning models, the o1 series.[1][2][3]
Kaiser's career spans two distinct phases. In the first, as a mathematician and logician, he worked on automata theory, logic, and games, completing a doctorate on automatic structures and taking a tenured research position with the CNRS in Paris.[4][5] In the second, beginning in 2013 at Google Brain, he became a machine learning researcher and a prolific contributor to systems and models that shaped modern AI, including TensorFlow, the Tensor2Tensor library, and the Transformer.[1][6] After moving to OpenAI in 2021, he contributed to large language models such as GPT-4 and to the development of reasoning-focused models.[6][7]
Among the eight co-authors of "Attention Is All You Need," Kaiser is often noted as the only one who did not subsequently leave to found or co-found a startup, choosing instead to remain a full-time research scientist.[13][14] He joined OpenAI before the public launch of ChatGPT and has since concentrated on reasoning models, which he regards as the most significant shift in the field since the Transformer.[13][15]
His Google Scholar profile lists him with hundreds of thousands of citations, the bulk of them attributable to "Attention Is All You Need," and gives his stated research interests as machine learning and logic in computer science.[3]
Kaiser earned dual master's degrees in computer science and mathematics from the University of Wrocław in Poland.[6][8] He then moved to Germany, completing his Ph.D. at RWTH Aachen University in 2008 with a dissertation titled "Logic and Games on Automatic Structures," supervised by Erich Grädel within the university's mathematical foundations of computer science group.[4][8][14] The thesis treats the evaluation of logical formulas as a game between two opponents and develops this game-theoretic view of first-order logic over infinite, automatically presentable structures, a topic that sits within algorithmic model theory.[4][14] An expanded version of the dissertation was later published by Springer as the monograph "Logic and Games on Automatic Structures: Playing with Quantifiers and Decompositions."[16] According to accounts of his career, the work was recognized with the E.W. Beth Dissertation Prize in 2009, an award given for outstanding doctoral dissertations in the fields of logic, language, and information.[8][14]
Following his doctorate, Kaiser took a permanent research scientist position (chargé de recherche) with the CNRS, based at the Laboratoire d'Informatique Algorithmique: Fondements et Applications (LIAFA) at what was then Paris Diderot University.[6][9] He has described this as a stable, tenured post with academic freedom, which he gave up in order to join Google.[14] During this period his research continued to center on logic, automata, and games rather than on machine learning.[4][9]
In 2013 Kaiser joined Google and became a research scientist on the Google Brain team in Mountain View, California, where his work shifted toward fundamental problems in deep learning and natural language processing.[6][14] Over roughly eight years there he co-authored the TensorFlow system and helped build several widely used research libraries and neural models for machine translation, parsing, and other algorithmic and generative tasks.[1][6] He has said that the rapid pace of the field meant constantly changing direction, summarizing the experience by noting that in deep learning "every two years, you have to do something completely different."[14]
Among his Google Brain projects:
Kaiser was also a co-creator and maintainer of Trax, an end-to-end deep learning library used within Google Brain that runs on the JAX and TensorFlow NumPy backends and takes its name from JAX.[6][20] Alongside his research, he co-created the Natural Language Processing Specialization on Coursera, offered by DeepLearning.AI, which he taught with Younes Bensouda Mourri.[21]
Kaiser is one of eight co-authors of "Attention Is All You Need," first posted to arXiv in June 2017 and presented at the NeurIPS conference in December 2017.[1][22] The paper proposed the Transformer, a sequence model that dispenses with recurrence and convolutions in favor of self-attention, and it became the foundation for most subsequent large language models. The eight authors are listed in random order and are credited as having contributed equally.[1]
The paper's author-contributions footnote describes Kaiser's specific role on the project:
"Lukasz and Aidan spent countless long days designing various parts of and implementing tensor2tensor, replacing our earlier codebase, greatly improving results and massively accelerating our research."[1]
The same footnote notes that the work was performed while the authors were at Google Brain.[1] The Transformer is now among the most cited papers in modern machine learning, and it accounts for the large majority of Kaiser's citation count.[3]
Kaiser left Google in 2021 and joined OpenAI, where he is described as a senior researcher and member of technical staff.[6][2] At OpenAI he contributed to the company's large language models and to a new line of models trained to "think" before answering by carrying out an internal chain of reasoning.[6][7]
Soon after arriving he co-authored two papers that prefigured the company's later reasoning work. "Evaluating Large Language Models Trained on Code" (2021) introduced Codex, a GPT model fine-tuned on public code that became the basis for GitHub Copilot, together with the HumanEval benchmark for program synthesis.[23] "Training Verifiers to Solve Math Word Problems" (2021) introduced the GSM8K dataset of grade-school math problems and showed that training a verifier to rank many sampled solutions improves accuracy, an early demonstration of trading extra computation at test time for better answers.[24] He is also among the listed authors of the GPT-4 Technical Report (2023), which is his second most-cited work.[3][25]
He is listed among the foundational contributors to OpenAI's o1, the company's first reasoning model, released in September 2024, and is described as a research lead for the o1 series.[5][15] He subsequently appears as one of the OpenAI authors of "Competitive Programming with Large Reasoning Models" (2025), a paper reporting that scaling general-purpose reinforcement learning, rather than relying on domain-specific techniques, drives state-of-the-art performance in reasoning domains such as competitive programming, including a gold-medal result for the o3 model at the 2024 International Olympiad in Informatics.[7] These reasoning systems are part of a broader class of reasoning models that allocate additional computation at inference time to improve accuracy on hard problems.
In public talks and interviews around the o1 launch, Kaiser has characterized reasoning models as a new paradigm distinct from the pure scaling of earlier large language models, likening the shift to going from a conversation generator to a genuine thinker.[2][15] He has argued that progress is far from stalling, predicting no near-term "AI winter," noting that reasoning models can require substantially less training data than earlier approaches, and identifying the availability of compute, particularly GPUs and energy, as the central bottleneck.[15]
| Year | Work | Note |
|---|---|---|
| 2015 | Neural GPUs Learn Algorithms | Parallel, computationally universal architecture for learning algorithms; with Ilya Sutskever[10] |
| 2017 | Depthwise Separable Convolutions for Neural Machine Translation | Introduced SliceNet; Kaiser lead author[17] |
| 2017 | One Model To Learn Them All | MultiModel multi-task network; released alongside Tensor2Tensor[11] |
| 2017 | Attention Is All You Need | Introduced the Transformer; one of eight co-authors[1] |
| 2018 | Universal Transformers | Depth-recurrent self-attention model[18] |
| 2020 | Reformer: The Efficient Transformer | Memory- and compute-efficient Transformer variant[12] |
| 2021 | Rethinking Attention with Performers | Linear-time attention approximation (FAVOR+)[19] |
| 2021 | Evaluating Large Language Models Trained on Code | Introduced Codex and HumanEval; OpenAI[23] |
| 2021 | Training Verifiers to Solve Math Word Problems | Introduced GSM8K and verifier-based selection; OpenAI[24] |
| 2025 | Competitive Programming with Large Reasoning Models | OpenAI reasoning-models paper; listed among authors[7] |
Kaiser's doctoral dissertation received the E.W. Beth Dissertation Prize in 2009, awarded annually for outstanding theses on logic, language, and information.[8][14] His machine learning research is among the most cited in the field: his Google Scholar profile records on the order of 400,000 total citations and an h-index in the seventies, driven chiefly by "Attention Is All You Need" along with the TensorFlow system papers and the GPT-4 Technical Report.[3] He is regularly grouped with the other authors of the Transformer paper as one of the researchers who launched the modern era of large language models.[13][14]
As of 2026, Kaiser is a researcher at OpenAI, where he works on large language models and reasoning systems.[6][2] His Google Scholar profile lists his affiliation as "OpenAI & CNRS," reflecting both his industry role and his earlier academic appointment.[3]