Łukasz Kaiser

Natural Language Processing People

11 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

27 citations

Revision

v3 · 2,172 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Łukasz Kaiser is a Polish computer scientist and researcher at OpenAI who is one of the eight co-authors of the 2017 paper "Attention Is All You Need," the work that introduced the Transformer architecture underpinning modern large language models. He is also a co-author of the TensorFlow machine learning system and a foundational contributor to OpenAI's reasoning models, the o1 and o3 series. Trained originally as a logician, Kaiser held a tenured research post at France's Centre National de la Recherche Scientifique (CNRS) and spent nearly eight years at Google Brain before joining OpenAI in 2021.^[1]^[2]^[3]^[26]

Who is Łukasz Kaiser?

Kaiser's career spans two distinct phases. In the first, as a mathematician and logician, he worked on automata theory, logic, and games, completing a doctorate on automatic structures and taking a tenured research position with the CNRS in Paris.^[4]^[5] In the second, beginning in 2013 at Google Brain, he became a machine learning researcher and a prolific contributor to systems and models that shaped modern AI, including TensorFlow, the Tensor2Tensor library, and the Transformer.^[1]^[6] After moving to OpenAI in 2021, he contributed to large language models such as GPT-4 and to the development of reasoning-focused models.^[6]^[7] In his own speaker biography he summarizes this work, stating that he "has co-invented reasoning models (o1, o3), ChatGPT and GPT-4 and GPT-5, and co-authored Transformers and the TensorFlow system."^[27]

Among the eight co-authors of "Attention Is All You Need," Kaiser is often noted as the only one who did not subsequently leave to found or co-found a startup, choosing instead to remain a full-time research scientist.^[13]^[14] He joined OpenAI before the public launch of ChatGPT and has since concentrated on reasoning models, which he regards as the most significant shift in the field since the Transformer.^[13]^[15]

His Google Scholar profile lists him with hundreds of thousands of citations, the bulk of them attributable to "Attention Is All You Need," and gives his stated research interests as machine learning and logic in computer science.^[3]

What is his educational and academic background?

Kaiser earned dual master's degrees in computer science and mathematics from the University of Wrocław in Poland.^[6]^[8] He then moved to Germany, completing his Ph.D. at RWTH Aachen University in 2008 with a dissertation titled "Logic and Games on Automatic Structures," supervised by Erich Grädel within the university's mathematical foundations of computer science group.^[4]^[8]^[14] The thesis treats the evaluation of logical formulas as a game between two opponents and develops this game-theoretic view of first-order logic over infinite, automatically presentable structures, a topic that sits within algorithmic model theory.^[4]^[14] An expanded version of the dissertation was later published by Springer as the monograph "Logic and Games on Automatic Structures: Playing with Quantifiers and Decompositions."^[16] According to accounts of his career, the work was recognized with the E.W. Beth Dissertation Prize in 2009, an award given for outstanding doctoral dissertations in the fields of logic, language, and information.^[8]^[14]

Following his doctorate, Kaiser took a permanent research scientist position (chargé de recherche) with the CNRS, based at the Laboratoire d'Informatique Algorithmique: Fondements et Applications (LIAFA) at what was then Paris Diderot University.^[6]^[9] He has described this as a stable, tenured post with academic freedom, which he gave up in order to join Google.^[14] During this period his research continued to center on logic, automata, and games rather than on machine learning.^[4]^[9]

What did he build at Google Brain?

In 2013 Kaiser joined Google and became a research scientist on the Google Brain team in Mountain View, California, where his work shifted toward fundamental problems in deep learning and natural language processing.^[6]^[14] Over roughly eight years there he co-authored the TensorFlow system and helped build several widely used research libraries and neural models for machine translation, parsing, and other algorithmic and generative tasks.^[1]^[6] He has said that the rapid pace of the field meant constantly changing direction, summarizing the experience by noting that in deep learning "every two years, you have to do something completely different."^[14]

Among his Google Brain projects:

Neural GPU. With Ilya Sutskever, Kaiser introduced the Neural GPU in "Neural GPUs Learn Algorithms" (2015), a highly parallel, computationally universal architecture based on convolutional gated recurrent units, designed to learn algorithmic tasks more easily than the harder-to-train Neural Turing Machine.^[10]
SliceNet. As lead author of "Depthwise Separable Convolutions for Neural Machine Translation" (2017), he introduced SliceNet, an architecture using depthwise separable convolutions to reduce the parameter count and computation of convolutional translation models while improving results, drawing on ideas from Xception and ByteNet.^[17]
MultiModel and Tensor2Tensor. In "One Model To Learn Them All" (2017), Kaiser and colleagues presented MultiModel, a single network combining convolutions, attention layers, and sparsely gated mixtures of experts that could be trained jointly across image classification, captioning, speech recognition, translation, and parsing.^[11] This research was accompanied by the open-source Tensor2Tensor library, a collection of models and datasets built on TensorFlow that Kaiser led and that included the reference implementation of the Transformer; he announced it in a Google Research blog post in June 2017.^[6]^[11]^[13]
Universal Transformers. He co-authored "Universal Transformers" (2018), a model that is recurrent in depth while using self-attention, combining the parallelizability of self-attentive feed-forward models with an inductive bias suited to algorithmic and language tasks.^[18]
Reformer. He later co-authored "Reformer: The Efficient Transformer" (2020), which reduced the memory and compute costs of Transformers using techniques such as locality-sensitive hashing attention.^[12]
Performers. He was among the authors of "Rethinking Attention with Performers" (2021), which approximates softmax attention in linear rather than quadratic time and space using a method the authors call FAVOR+.^[19]

Kaiser was also a co-creator and maintainer of Trax, an end-to-end deep learning library used within Google Brain that runs on the JAX and TensorFlow NumPy backends and takes its name from JAX.^[6]^[20] Alongside his research, he co-created the Natural Language Processing Specialization on Coursera, offered by DeepLearning.AI, which he taught with Younes Bensouda Mourri.^[21]

What is his role in the "Attention Is All You Need" paper?

Kaiser is one of eight co-authors of "Attention Is All You Need," first posted to arXiv in June 2017 and presented at the NeurIPS conference in December 2017.^[1]^[22] The paper proposed the Transformer, a sequence model that dispenses with recurrence and convolutions in favor of self-attention, and it became the foundation for most subsequent large language models. The eight authors are listed in random order and are credited as having contributed equally.^[1] The full author list is Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Łukasz Kaiser, and Illia Polosukhin.^[1]^[22]

The paper's author-contributions footnote describes Kaiser's specific role on the project:

"Lukasz and Aidan spent countless long days designing various parts of and implementing tensor2tensor, replacing our earlier codebase, greatly improving results and massively accelerating our research."^[1]

The same footnote notes that the work was performed while the authors were at Google Brain.^[1] The Transformer is now among the most cited papers in modern machine learning, and it accounts for the large majority of Kaiser's citation count.^[3]

What does he work on at OpenAI?

Kaiser left Google in 2021 and joined OpenAI, where he is described as a researcher and member of technical staff.^[6]^[2]^[26] At OpenAI he contributed to the company's large language models and to a new line of models trained to "think" before answering by carrying out an internal chain of reasoning.^[6]^[7]

Soon after arriving he co-authored two papers that prefigured the company's later reasoning work. "Evaluating Large Language Models Trained on Code" (2021) introduced Codex, a GPT model fine-tuned on public code that became the basis for GitHub Copilot, together with the HumanEval benchmark for program synthesis.^[23] "Training Verifiers to Solve Math Word Problems" (2021) introduced the GSM8K dataset of grade-school math problems and showed that training a verifier to rank many sampled solutions improves accuracy, an early demonstration of trading extra computation at test time for better answers.^[24] He is also among the listed authors of the GPT-4 Technical Report (2023), which is his second most-cited work.^[3]^[25]

He is listed among the foundational contributors to OpenAI's o1, the company's first reasoning model, released in September 2024, and is described as a research lead for the o1 series.^[5]^[15] He subsequently appears as one of the OpenAI authors of "Competitive Programming with Large Reasoning Models" (2025), a paper reporting that scaling general-purpose reinforcement learning, rather than relying on domain-specific techniques, drives state-of-the-art performance in reasoning domains such as competitive programming, including a gold-medal result for the o3 model at the 2024 International Olympiad in Informatics.^[7] These reasoning systems are part of a broader class of reasoning models that allocate additional computation at inference time to improve accuracy on hard problems.

In public talks and interviews around the o1 launch, Kaiser has characterized reasoning models as a new paradigm distinct from the pure scaling of earlier large language models, likening the shift to going from a conversation generator to a genuine thinker.^[2]^[15] He has argued that progress is far from stalling, predicting no near-term "AI winter," noting that reasoning models can require substantially less training data than earlier approaches, and identifying the availability of compute, particularly GPUs and energy, as the central bottleneck.^[15]

What are his most notable works?

Year	Work	Note
2015	Neural GPUs Learn Algorithms	Parallel, computationally universal architecture for learning algorithms; with Ilya Sutskever^[10]
2017	Depthwise Separable Convolutions for Neural Machine Translation	Introduced SliceNet; Kaiser lead author^[17]
2017	One Model To Learn Them All	MultiModel multi-task network; released alongside Tensor2Tensor^[11]
2017	Attention Is All You Need	Introduced the Transformer; one of eight co-authors^[1]
2018	Universal Transformers	Depth-recurrent self-attention model^[18]
2020	Reformer: The Efficient Transformer	Memory- and compute-efficient Transformer variant^[12]
2021	Rethinking Attention with Performers	Linear-time attention approximation (FAVOR+)^[19]
2021	Evaluating Large Language Models Trained on Code	Introduced Codex and HumanEval; OpenAI^[23]
2021	Training Verifiers to Solve Math Word Problems	Introduced GSM8K and verifier-based selection; OpenAI^[24]
2025	Competitive Programming with Large Reasoning Models	OpenAI reasoning-models paper; listed among authors^[7]

What recognition has he received?

Kaiser's doctoral dissertation received the E.W. Beth Dissertation Prize in 2009, awarded annually for outstanding theses on logic, language, and information.^[8]^[14] His machine learning research is among the most cited in the field: his Google Scholar profile records on the order of 400,000 total citations and an h-index in the seventies, driven chiefly by "Attention Is All You Need" along with the TensorFlow system papers and the GPT-4 Technical Report.^[3] He is regularly grouped with the other authors of the Transformer paper as one of the researchers who launched the modern era of large language models.^[13]^[14]

What is his current role?

As of 2026, Kaiser is a researcher at OpenAI, where he works on large language models and reasoning systems.^[6]^[2]^[26] Speaker biographies variously describe his title as researcher, senior research scientist, and member of technical staff.^[26]^[27]^[2] His Google Scholar profile lists his affiliation as "OpenAI & CNRS," reflecting both his industry role and his earlier academic appointment.^[3] He continues to speak publicly on reasoning models and test-time compute, including as a 2026 panelist at the TEDAI conference in Vienna.^[26]

References

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Illia Polosukhin Llion Jones Noam Shazeer

Who is Łukasz Kaiser?

What is his educational and academic background?

What did he build at Google Brain?

What is his role in the "Attention Is All You Need" paper?

What does he work on at OpenAI?

What are his most notable works?

What recognition has he received?

What is his current role?

References

Improve this article

Related Articles

Christopher Manning

Mike Lewis

Emily M. Bender

Jacob Devlin

Demis Hassabis

Ilya Sutskever

What links here

Related Articles

Christopher Manning

Mike Lewis

Emily M. Bender

Jacob Devlin

Demis Hassabis

Ilya Sutskever

What links here