# Łukasz Kaiser

> Source: https://aiwiki.ai/wiki/lukasz_kaiser
> Updated: 2026-06-28
> Categories: Natural Language Processing, People
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**Łukasz Kaiser** is a Polish computer scientist and researcher at [OpenAI](/wiki/openai) who is one of the eight co-authors of the 2017 paper "Attention Is All You Need," the work that introduced the [Transformer](/wiki/transformers) architecture underpinning modern large language models. He is also a co-author of the [TensorFlow](/wiki/tensorflow) machine learning system and a foundational contributor to OpenAI's reasoning models, the [o1](/wiki/o1) and [o3](/wiki/o3) series. Trained originally as a logician, Kaiser held a tenured research post at France's Centre National de la Recherche Scientifique (CNRS) and spent nearly eight years at [Google Brain](/wiki/google_brain) before joining OpenAI in 2021.[1][2][3][26]

## Who is Łukasz Kaiser?

Kaiser's career spans two distinct phases. In the first, as a mathematician and logician, he worked on automata theory, logic, and games, completing a doctorate on automatic structures and taking a tenured research position with the CNRS in Paris.[4][5] In the second, beginning in 2013 at Google Brain, he became a machine learning researcher and a prolific contributor to systems and models that shaped modern AI, including TensorFlow, the Tensor2Tensor library, and the Transformer.[1][6] After moving to OpenAI in 2021, he contributed to large language models such as [GPT-4](/wiki/gpt-4) and to the development of reasoning-focused models.[6][7] In his own speaker biography he summarizes this work, stating that he "has co-invented reasoning models (o1, o3), ChatGPT and GPT-4 and GPT-5, and co-authored Transformers and the TensorFlow system."[27]

Among the eight co-authors of "Attention Is All You Need," Kaiser is often noted as the only one who did not subsequently leave to found or co-found a startup, choosing instead to remain a full-time research scientist.[13][14] He joined OpenAI before the public launch of [ChatGPT](/wiki/chatgpt) and has since concentrated on reasoning models, which he regards as the most significant shift in the field since the Transformer.[13][15]

His Google Scholar profile lists him with hundreds of thousands of citations, the bulk of them attributable to "Attention Is All You Need," and gives his stated research interests as machine learning and logic in computer science.[3]

## What is his educational and academic background?

Kaiser earned dual master's degrees in computer science and mathematics from the University of Wrocław in Poland.[6][8] He then moved to Germany, completing his Ph.D. at RWTH Aachen University in 2008 with a dissertation titled "Logic and Games on Automatic Structures," supervised by Erich Grädel within the university's mathematical foundations of computer science group.[4][8][14] The thesis treats the evaluation of logical formulas as a game between two opponents and develops this game-theoretic view of first-order logic over infinite, automatically presentable structures, a topic that sits within algorithmic model theory.[4][14] An expanded version of the dissertation was later published by Springer as the monograph "Logic and Games on Automatic Structures: Playing with Quantifiers and Decompositions."[16] According to accounts of his career, the work was recognized with the E.W. Beth Dissertation Prize in 2009, an award given for outstanding doctoral dissertations in the fields of logic, language, and information.[8][14]

Following his doctorate, Kaiser took a permanent research scientist position (chargé de recherche) with the CNRS, based at the Laboratoire d'Informatique Algorithmique: Fondements et Applications (LIAFA) at what was then Paris Diderot University.[6][9] He has described this as a stable, tenured post with academic freedom, which he gave up in order to join Google.[14] During this period his research continued to center on logic, automata, and games rather than on machine learning.[4][9]

## What did he build at Google Brain?

In 2013 Kaiser joined Google and became a research scientist on the Google Brain team in Mountain View, California, where his work shifted toward fundamental problems in [deep learning](/wiki/deep_learning) and natural language processing.[6][14] Over roughly eight years there he co-authored the TensorFlow system and helped build several widely used research libraries and neural models for machine translation, parsing, and other algorithmic and generative tasks.[1][6] He has said that the rapid pace of the field meant constantly changing direction, summarizing the experience by noting that in deep learning "every two years, you have to do something completely different."[14]

Among his Google Brain projects:

- **Neural GPU.** With Ilya Sutskever, Kaiser introduced the Neural GPU in "Neural GPUs Learn Algorithms" (2015), a highly parallel, computationally universal architecture based on convolutional gated recurrent units, designed to learn algorithmic tasks more easily than the harder-to-train Neural Turing Machine.[10]
- **SliceNet.** As lead author of "Depthwise Separable Convolutions for Neural Machine Translation" (2017), he introduced SliceNet, an architecture using depthwise separable convolutions to reduce the parameter count and computation of convolutional translation models while improving results, drawing on ideas from Xception and ByteNet.[17]
- **MultiModel and Tensor2Tensor.** In "One Model To Learn Them All" (2017), Kaiser and colleagues presented MultiModel, a single network combining convolutions, [attention](/wiki/attention) layers, and sparsely gated mixtures of experts that could be trained jointly across image classification, captioning, speech recognition, translation, and parsing.[11] This research was accompanied by the open-source Tensor2Tensor library, a collection of models and datasets built on TensorFlow that Kaiser led and that included the reference implementation of the Transformer; he announced it in a Google Research blog post in June 2017.[6][11][13]
- **Universal Transformers.** He co-authored "Universal Transformers" (2018), a model that is recurrent in depth while using self-attention, combining the parallelizability of self-attentive feed-forward models with an inductive bias suited to algorithmic and language tasks.[18]
- **Reformer.** He later co-authored "Reformer: The Efficient Transformer" (2020), which reduced the memory and compute costs of Transformers using techniques such as locality-sensitive hashing attention.[12]
- **Performers.** He was among the authors of "Rethinking Attention with Performers" (2021), which approximates softmax attention in linear rather than quadratic time and space using a method the authors call FAVOR+.[19]

Kaiser was also a co-creator and maintainer of Trax, an end-to-end deep learning library used within Google Brain that runs on the [JAX](/wiki/jax) and TensorFlow NumPy backends and takes its name from JAX.[6][20] Alongside his research, he co-created the Natural Language Processing Specialization on Coursera, offered by DeepLearning.AI, which he taught with Younes Bensouda Mourri.[21]

## What is his role in the "Attention Is All You Need" paper?

Kaiser is one of eight co-authors of "Attention Is All You Need," first posted to arXiv in June 2017 and presented at the NeurIPS conference in December 2017.[1][22] The paper proposed the Transformer, a sequence model that dispenses with recurrence and convolutions in favor of self-attention, and it became the foundation for most subsequent large language models. The eight authors are listed in random order and are credited as having contributed equally.[1] The full author list is Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Łukasz Kaiser, and Illia Polosukhin.[1][22]

The paper's author-contributions footnote describes Kaiser's specific role on the project:

> "Lukasz and Aidan spent countless long days designing various parts of and implementing tensor2tensor, replacing our earlier codebase, greatly improving results and massively accelerating our research."[1]

The same footnote notes that the work was performed while the authors were at Google Brain.[1] The Transformer is now among the most cited papers in modern machine learning, and it accounts for the large majority of Kaiser's citation count.[3]

## What does he work on at OpenAI?

Kaiser left Google in 2021 and joined OpenAI, where he is described as a researcher and member of technical staff.[6][2][26] At OpenAI he contributed to the company's large language models and to a new line of models trained to "think" before answering by carrying out an internal chain of reasoning.[6][7]

Soon after arriving he co-authored two papers that prefigured the company's later reasoning work. "Evaluating Large Language Models Trained on Code" (2021) introduced Codex, a GPT model fine-tuned on public code that became the basis for [GitHub Copilot](/wiki/github_copilot), together with the HumanEval benchmark for program synthesis.[23] "Training Verifiers to Solve Math Word Problems" (2021) introduced the GSM8K dataset of grade-school math problems and showed that training a verifier to rank many sampled solutions improves accuracy, an early demonstration of trading extra computation at test time for better answers.[24] He is also among the listed authors of the GPT-4 Technical Report (2023), which is his second most-cited work.[3][25]

He is listed among the foundational contributors to OpenAI's o1, the company's first reasoning model, released in September 2024, and is described as a research lead for the o1 series.[5][15] He subsequently appears as one of the OpenAI authors of "Competitive Programming with Large Reasoning Models" (2025), a paper reporting that scaling general-purpose reinforcement learning, rather than relying on domain-specific techniques, drives state-of-the-art performance in reasoning domains such as competitive programming, including a gold-medal result for the [o3](/wiki/o3) model at the 2024 International Olympiad in Informatics.[7] These reasoning systems are part of a broader class of [reasoning models](/wiki/reasoning_models) that allocate additional computation at inference time to improve accuracy on hard problems.

In public talks and interviews around the o1 launch, Kaiser has characterized reasoning models as a new paradigm distinct from the pure scaling of earlier [large language models](/wiki/large_language_model), likening the shift to going from a conversation generator to a genuine thinker.[2][15] He has argued that progress is far from stalling, predicting no near-term "AI winter," noting that reasoning models can require substantially less training data than earlier approaches, and identifying the availability of compute, particularly [GPUs](/wiki/gpu) and energy, as the central bottleneck.[15]

## What are his most notable works?

| Year | Work | Note |
|------|------|------|
| 2015 | Neural GPUs Learn Algorithms | Parallel, computationally universal architecture for learning algorithms; with Ilya Sutskever[10] |
| 2017 | Depthwise Separable Convolutions for Neural Machine Translation | Introduced SliceNet; Kaiser lead author[17] |
| 2017 | One Model To Learn Them All | MultiModel multi-task network; released alongside Tensor2Tensor[11] |
| 2017 | Attention Is All You Need | Introduced the Transformer; one of eight co-authors[1] |
| 2018 | Universal Transformers | Depth-recurrent self-attention model[18] |
| 2020 | Reformer: The Efficient Transformer | Memory- and compute-efficient Transformer variant[12] |
| 2021 | Rethinking Attention with Performers | Linear-time attention approximation (FAVOR+)[19] |
| 2021 | Evaluating Large Language Models Trained on Code | Introduced Codex and HumanEval; OpenAI[23] |
| 2021 | Training Verifiers to Solve Math Word Problems | Introduced GSM8K and verifier-based selection; OpenAI[24] |
| 2025 | Competitive Programming with Large Reasoning Models | OpenAI reasoning-models paper; listed among authors[7] |

## What recognition has he received?

Kaiser's doctoral dissertation received the E.W. Beth Dissertation Prize in 2009, awarded annually for outstanding theses on logic, language, and information.[8][14] His machine learning research is among the most cited in the field: his Google Scholar profile records on the order of 400,000 total citations and an h-index in the seventies, driven chiefly by "Attention Is All You Need" along with the TensorFlow system papers and the GPT-4 Technical Report.[3] He is regularly grouped with the other authors of the Transformer paper as one of the researchers who launched the modern era of large language models.[13][14]

## What is his current role?

As of 2026, Kaiser is a researcher at OpenAI, where he works on large language models and reasoning systems.[6][2][26] Speaker biographies variously describe his title as researcher, senior research scientist, and member of technical staff.[26][27][2] His Google Scholar profile lists his affiliation as "OpenAI & CNRS," reflecting both his industry role and his earlier academic appointment.[3] He continues to speak publicly on reasoning models and test-time compute, including as a 2026 panelist at the TEDAI conference in Vienna.[26]

## References

1. [Vaswani et al., "Attention Is All You Need," NeurIPS 2017 (arXiv:1706.03762)](https://arxiv.org/abs/1706.03762)
2. [Pathway, "The Future of Large Language Models by Lukasz Kaiser and Jan Chorowski" (2024)](https://pathway.com/news/pathway-meetup-2024)
3. [Łukasz Kaiser, Google Scholar profile](https://scholar.google.com/citations?user=JWmiQR0AAAAJ&hl=en)
4. [Łukasz Kaiser, "Logic and Games on Automatic Structures," publications, RWTH Aachen (Mathematical Foundations of Computer Science)](https://logic.rwth-aachen.de/Publications/kaiser.html.en)
5. [OpenAI, "OpenAI o1 Contributions"](https://openai.com/openai-o1-contributions/)
6. [Łukasz Kaiser, speaker biography, Warsaw.AI](https://warsaw.ai/speaker/lukasz-kaiser/)
7. [OpenAI, "Competitive Programming with Large Reasoning Models" (arXiv:2502.06807)](https://arxiv.org/abs/2502.06807)
8. [36Kr, "From Transformer to GPT-5: First-Principles Thinking About Large Models" (profile of Lukasz Kaiser)](https://eu.36kr.com/en/p/3477639604803976)
9. [Lukasz Kaiser, CNRS / LIAFA research profile, ResearchGate](https://www.researchgate.net/profile/Lukasz-Kaiser)
10. [Kaiser and Sutskever, "Neural GPUs Learn Algorithms" (arXiv:1511.08228), Google Research](https://research.google/pubs/pub45139/)
11. [Kaiser et al., "One Model To Learn Them All" (arXiv:1706.05137)](https://arxiv.org/abs/1706.05137)
12. [Kitaev, Kaiser, Levskaya, "Reformer: The Efficient Transformer" (arXiv:2001.04451)](https://arxiv.org/abs/2001.04451)
13. [Coursera, "Łukasz Kaiser, Instructor"](https://www.coursera.org/instructor/lukaszkaiser)
14. [Transfer Multisort Elektronik, "Łukasz Kaiser: from logic to revolution in artificial intelligence"](https://www.tme.com/us/en-us/news/library-articles/inventors/page/73732/lukasz-kaiser-from-logic-to-revolution-in-artificial-intelligence/)
15. [36Kr, "Transformer Author's Weighty Prediction: No Winter for AI, Inference Revolution to Detonate Trillion-Dollar Market"](https://eu.36kr.com/en/p/3552808301886344)
16. [Łukasz Kaiser, "Logic and Games on Automatic Structures: Playing with Quantifiers and Decompositions," Springer (2011)](https://link.springer.com/book/10.1007/978-3-642-22807-0)
17. [Kaiser, Gomez, Chollet, "Depthwise Separable Convolutions for Neural Machine Translation" (arXiv:1706.03059)](https://arxiv.org/abs/1706.03059)
18. [Dehghani, Gouws, Vinyals, Uszkoreit, Kaiser, "Universal Transformers" (arXiv:1807.03819)](https://arxiv.org/abs/1807.03819)
19. [Choromanski et al., "Rethinking Attention with Performers," ICLR 2021 (arXiv:2009.14794)](https://arxiv.org/abs/2009.14794)
20. [google/trax, "Trax: Deep Learning with Clear Code and Speed," GitHub](https://github.com/google/trax)
21. [DeepLearning.AI, "Natural Language Processing Specialization," Coursera](https://www.coursera.org/specializations/natural-language-processing)
22. [Wikipedia, "Attention Is All You Need"](https://en.wikipedia.org/wiki/Attention_Is_All_You_Need)
23. [Chen et al., "Evaluating Large Language Models Trained on Code" (arXiv:2107.03374)](https://arxiv.org/abs/2107.03374)
24. [Cobbe et al., "Training Verifiers to Solve Math Word Problems" (arXiv:2110.14168)](https://arxiv.org/abs/2110.14168)
25. [OpenAI, "GPT-4 Technical Report" (arXiv:2303.08774)](https://arxiv.org/abs/2303.08774)
26. [Lukasz Kaiser, panelist biography, TEDAI Vienna 2026](https://tedai-vienna.ted.com/panelists-2025/lukasz-kaiser)
27. [Lukasz Kaiser, speaker biography, Machine Learning Summit 2025](https://ml-summit.org/speaker/626?uid=c1047&lang=en)