Richard S. Sutton

AI Research People Reinforcement Learning

23 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

18 citations

Revision

v5 · 4,696 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Richard Stuart Sutton (born 1957 or 1958) is a Canadian-American computer scientist regarded as one of the founders of modern computational reinforcement learning, and a co-recipient, with Andrew G. Barto, of the 2024 ACM A.M. Turing Award "for developing the conceptual and algorithmic foundations of reinforcement learning."^[1]^[2] He introduced temporal-difference (TD) learning, co-authored the field's standard textbook "Reinforcement Learning: An Introduction" (1998; 2nd ed. 2018), and wrote the influential 2019 essay "The Bitter Lesson."^[3]^[5]^[6]

Sutton is a professor of computing science at the University of Alberta, a Fellow and Chief Scientific Advisor at the Alberta Machine Intelligence Institute (Amii), and a research scientist at John Carmack's artificial-general-intelligence startup Keen Technologies.^[3]^[12] His research career, which began with a Stanford undergraduate thesis on trial-and-error learning, has spanned more than four decades and shaped almost every major idea in present-day reinforcement learning. He introduced TD learning in 1988, co-developed the actor-critic family of algorithms, formulated the Dyna architecture that unifies model-free learning with planning, helped found policy-gradient theory with function approximation, and (with Doina Precup and Satinder Singh) developed the options framework for temporal abstraction.^[3]^[4] His textbook with Barto has been cited well over 80,000 times.^[3]^[5]

Beyond his technical work, Sutton is widely known for "The Bitter Lesson," a short 2019 essay that argued general methods which scale with compute reliably out-perform approaches built on hand-crafted human knowledge, an argument that has become a touchstone for debates over scaling, deep learning, and the future of AI research.^[6]^[7] Since 2003 he has built one of the world's largest reinforcement-learning research groups at the University of Alberta in Edmonton, and from 2017 to 2023 he led Google DeepMind's first non-UK research office in the same city.^[8]^[9] In September 2023 he announced a new partnership with John Carmack aimed at building a working artificial general intelligence prototype "by 2030."^[10]

Key facts


Born	1957 or 1958, Toledo, Ohio, U.S.^[3]
Citizenship	Canadian (naturalized 2015); previously American^[3]
Education	BA Psychology, Stanford University (1978); MS Computer Science, University of Massachusetts Amherst (1980); PhD Computer Science, University of Massachusetts Amherst (1984)^[3]^[11]
Doctoral advisor	Andrew G. Barto^[1]^[3]
Doctoral thesis	"Temporal Credit Assignment in Reinforcement Learning" (1984)^[3]
Known for	Temporal-difference learning; actor-critic methods; Dyna architecture; options framework; policy-gradient methods; "Reinforcement Learning: An Introduction"; "The Bitter Lesson"^[1]^[3]^[6]
Doctoral students (selected)	David Silver, Doina Precup^[3]
Current positions	Professor of Computing Science, University of Alberta (since 2003); Fellow, Chief Scientific Advisor & Canada CIFAR AI Chair, Amii; Research Scientist, Keen Technologies (since 2023)^[3]^[12]
Major award	2024 ACM A.M. Turing Award, shared with Andrew G. Barto^[1]^[2]

Who is Richard Sutton?

Richard S. Sutton is a computer scientist who, more than any other single individual, defined the modern field of reinforcement learning: the branch of machine learning in which an agent learns by trial and error from rewards rather than from labeled examples. In its 2024 Turing Award announcement, the Association for Computing Machinery (ACM) credited Sutton and Andrew Barto with having "introduced the main ideas, constructed the mathematical foundations, and developed important algorithms for reinforcement learning" over a continuous period of more than four decades.^[1]^[11] By 2025 his published work had been cited more than 140,000 times in Google Scholar.^[3]^[12]

Early life and education

Richard S. Sutton was born in 1957 (sources also give 1958) in Toledo, Ohio, in the United States. He later took Canadian citizenship in 2015 after moving permanently to Edmonton, Alberta, and renounced his U.S. citizenship in 2017, making him a Canadian citizen rather than an American at the time he received the Turing Award.^[3]

Sutton studied at Stanford University, where he received a Bachelor of Arts in psychology in 1978. The Stanford program shaped a lifelong interest in animal learning and the psychological roots of intelligent behavior; his earliest published work, completed before he had a graduate degree in computer science, applied ideas from operant conditioning to artificial learning systems. He has long pointed to psychology, and particularly to the literature on classical and operant conditioning of animals, as a primary inspiration for the reinforcement-learning paradigm, in which an agent learns by interacting with an environment and observing rewards.^[3]^[11]

He then moved to the University of Massachusetts Amherst, where he was supervised by Andrew G. Barto, then a young computer-science professor with a similar interest in adaptive systems. Sutton was Barto's first doctoral student. Sutton earned his Master of Science in computer and information science in 1980 and a Ph.D. in 1984. His doctoral thesis, "Temporal Credit Assignment in Reinforcement Learning," was the first systematic treatment of the credit-assignment problem that lies at the heart of reinforcement learning: how an agent should attribute long-delayed rewards to the specific decisions that caused them. The thesis introduced what would later become known as temporal-difference (TD) learning and laid out an early version of the actor-critic architecture, in which two adaptive modules (an "actor" that selects actions and a "critic" that evaluates them) co-evolve.^[3]^[11]

The collaboration between Barto and Sutton that began in graduate school continued throughout both of their careers, producing dozens of joint papers and the eventual textbook that defined the field. ACM, in its 2024 Turing Award announcement, would later describe the pair as having "introduced the main ideas, constructed the mathematical foundations, and developed important algorithms for reinforcement learning" over a continuous period of more than four decades.^[1]^[11]

What did Richard Sutton do early in his career?

After a brief postdoctoral period at UMass Amherst in 1984, Sutton joined GTE Laboratories in Waltham, Massachusetts, in 1985 as a principal member of technical staff. He remained there until 1994, working on connectionist learning, control, and predictive learning systems.^[3] It was during the GTE years that he published "Learning to predict by the methods of temporal differences" (Machine Learning, 1988), the paper that formally established TD learning as a class of algorithms and proved its first convergence results. The paper is among the most influential in the history of machine learning; nearly every modern value-based RL algorithm (Q-learning, SARSA, deep Q-networks, the value heads of alphazero and successor systems) traces its lineage back to it.^[4]^[5]^[11]

Also at GTE, Sutton developed the Dyna architecture (published in 1990-1991), which became one of the first concrete proposals for integrating learning, planning, and acting within a single agent.^[3]^[4] During this period he also began the extended series of collaborations with Barto that would eventually become "Reinforcement Learning: An Introduction." The two had circulated draft chapters of the textbook within the research community for several years before its formal publication.^[5]

Sutton then returned to the University of Massachusetts Amherst from 1995 to 1998 as a senior research scientist in Barto's group, before moving to industry again in 1998. From 1998 to 2002 he was a principal technical staff member at AT&T Labs' Shannon Laboratory in Florham Park, New Jersey, working alongside other notable AI and statistics researchers in the laboratory's "Artificial Intelligence Department." The Shannon Laboratory was at the time one of the most concentrated centers of machine-learning research in industry, and Sutton's tenure there overlapped with the heyday of statistical learning at Bell Labs and AT&T.^[3]

During the AT&T years, Sutton and Barto completed and published the first edition of their textbook, "Reinforcement Learning: An Introduction" (MIT Press, 1998), which would go on to define the field. He also co-authored, in 1999 and 2000, two of his most-cited papers: the options-framework paper in Artificial Intelligence and the policy-gradient theorem paper at NeurIPS.^[3]^[5]

What is Richard Sutton's role at the University of Alberta?

In 2003 Sutton accepted a professorship in the Department of Computing Science at the University of Alberta in Edmonton, Canada. He has been there ever since and has built the university into a global center for reinforcement-learning research.^[8]^[12]

Sutton founded and continues to lead the Reinforcement Learning and Artificial Intelligence (RLAI) Lab at U of A. By the mid-2020s the lab had grown to roughly ten principal investigators and more than one hundred researchers, making it one of the largest reinforcement-learning groups in the world. Edmonton's reputation as a hub for RL, distinct for example from Mila in Montreal or the Vector Institute in Toronto, is largely the product of Sutton's lab, in combination with the broader work of Michael Bowling's computer-games and poker group, the contributions of Patrick Pilarski in rehabilitation robotics, and the long-running computer-Go program of the late Jonathan Schaeffer.^[12]^[13]

At U of A, Sutton initially held the AT&T-Alberta Research Chair, named in part for his former employer. He was later named a Canada CIFAR AI Chair under the Pan-Canadian AI Strategy.^[12] He is also a Fellow and the Chief Scientific Advisor of the Alberta Machine Intelligence Institute (Amii), an Edmonton-based AI institute that grew out of the Alberta Ingenuity Centre for Machine Learning (AICML), founded in 2002 jointly by the University of Alberta and the Government of Alberta. Amii was rebranded under its current name in 2017 when it was designated one of three national AI institutes under the CIFAR Pan-Canadian AI Strategy (alongside Mila in Quebec and the Vector Institute in Ontario).^[14]^[12]

Through both Amii and his own lab, Sutton has supervised or mentored close to sixty graduate students and postdoctoral researchers. Two of the most prominent are David Silver (later the principal architect of alphago and alphazero at google deepmind) and Doina Precup, who became a co-creator of the options framework, head of DeepMind's Montreal office, and a leading figure in hierarchical RL. Other prominent former students and postdoctoral collaborators include Michael Bowling (now also a professor at U of A and a key figure behind Libratus/DeepStack poker), Csaba Szepesvari, Adam White, and Martha White. By 2025, Sutton's published work had been cited more than 140,000 times in Google Scholar.^[3]^[12]^[13]

Sutton's work at the University of Alberta has also fed directly into the AI-and-robotics community in Edmonton. His group's research has been applied in areas including assistive robotics (with Patrick Pilarski), continual learning, and life-long learning, fields that emphasize agents which keep learning after deployment rather than being trained once and frozen.^[12]

When was Richard Sutton at DeepMind?

On July 5, 2017, Google's deepmind announced its first international research office outside the United Kingdom, located in Edmonton and known as DeepMind Alberta. The announcement explicitly described the new lab as a deep partnership with the University of Alberta. Sutton, described in DeepMind's announcement as "the pioneer of reinforcement learning - and DeepMind's first ever advisor from back in 2010," co-founded and led the lab together with U of A colleagues Michael Bowling and Patrick Pilarski, all of whom retained their university professorships. He held the title of Distinguished Research Scientist at DeepMind.^[9]

DeepMind had been working closely with researchers in Edmonton for years before formally opening the office. Sutton had served as an advisor to the company since approximately 2010, shortly after DeepMind was founded by Demis Hassabis, Shane Legg, and Mustafa Suleyman in London. DeepMind committed to long-term funding for AI programs at the University of Alberta as part of the 2017 arrangement, and the lab grew to include more than a dozen researchers, several of whom were co-authors of the influential DeepStack poker paper that the Edmonton group had published earlier.^[9]

The Edmonton office was thus, by design, a tightly coupled academic-industrial partnership: the same researchers held both DeepMind and University of Alberta affiliations, and the lab's work fed directly into both DeepMind's broader google deepmind research agenda and U of A's PhD programs. Sutton continued his teaching and graduate-student supervision throughout this period.^[9]^[12]

In January 2023, after Google reorganized DeepMind as part of broader cost cuts at Alphabet and consolidated the company's research labs, the Edmonton office was closed. Sutton chose to remain in Alberta and continue his research at the University of Alberta and Amii. His DeepMind affiliation ended in 2023, though his collaborations with DeepMind researchers, most notably David Silver, continued (see "Welcome to the Era of Experience," 2025).^[3]^[15]^[16]

Why did Richard Sutton join Keen Technologies?

On September 25, 2023, programmer John Carmack, best known as the co-creator of the Doom and Quake game engines and a co-founder of id Software, and Sutton publicly announced a new partnership aimed at accelerating the development of artificial general intelligence (agi). Sutton joined Carmack's startup Keen Technologies, a small Texas-based AGI research company that Carmack founded in 2022 and funded in part out of his own resources. Keen had raised an initial $20 million round in August 2022, led by Nat Friedman, Daniel Gross, Patrick Collison, Tobi Lutke, Jim Keller, Sequoia Capital, and Capital Factory. Sutton joined as a research scientist while retaining his positions at the University of Alberta and Amii.^[10]^[15]^[18]

In the joint announcement, Sutton said: "I am excited to partner with John and the rest of the team at Keen Technologies. John is a powerful intellect and one of the world's greatest system engineers." Carmack, for his part, framed the collaboration as a deliberate alternative to the dominant approaches of the largest AI labs: "The AI space is awash in capital, compute, and data, but it is still dominated by fashions that may yet hinder important breakthroughs." The two stated that they were focused on developing "a genuine AI prototype by 2030," including establishing and documenting what they called "AGI signs of life."^[10]^[18]

The pairing was widely noted by AI researchers and the technology press because it joined one of reinforcement learning's most influential theorists with one of the games industry's most celebrated low-level programmers. Keen's approach, as described publicly by both founders, has emphasized small focused teams, on-device learning, and reinforcement-learning-style "experience" rather than the very large pretraining runs typical of the major AI labs, a perspective that lines up with both Sutton's "Bitter Lesson" and his more recent emphasis on continual, on-the-job learning.^[10]^[16]

What are Richard Sutton's major research contributions?

Sutton's body of research is unusually broad, but a small number of contributions stand out for their lasting impact on machine learning and AI research.

What is temporal-difference learning?

Sutton introduced temporal difference learning (TD learning) in his 1984 doctoral thesis and formalized it in the 1988 paper "Learning to predict by the methods of temporal differences," published in the journal Machine Learning. TD learning combines ideas from Monte Carlo methods and dynamic programming: it allows an agent to update its predictions about the long-run future at every time step, by comparing successive predictions ("bootstrapping"), rather than waiting for a final outcome. The 1988 paper proved convergence under certain conditions and demonstrated TD's effectiveness on prediction tasks such as random-walk Markov chains.^[4]^[1]

TD learning underpins almost all modern value-based RL algorithms, including Q-learning (Watkins, 1989, building directly on Sutton's TD work), SARSA (named after the state-action-reward-state-action tuple it updates), and the deep-RL methods used in systems such as DeepMind's deep Q-network, alphago, alphazero, and alphaproof.^[2]^[11]

A 1995 result by Sutton, Barto, and colleagues drew a striking link between TD learning and the firing patterns of dopamine neurons in the brain, helping launch the modern neuroscience of reward prediction and influence the broader field of computational neuroscience. This convergence of evidence between behavior, neuroscience, and computer-science models is often cited as one of the success stories of computational cognitive science.^[11]

What are actor-critic methods?

Building on his doctoral work with Barto, Sutton was a co-developer of the actor-critic family of reinforcement-learning algorithms, in which a "critic" learns a value function and an "actor" learns a policy guided by the critic's TD error. Modern descendants of actor-critic, including A3C, DDPG, SAC, and PPO, are workhorses of deep reinforcement learning and underpin many state-of-the-art results in robotics, game-playing, and large-model fine-tuning.^[1]^[3]

What is the Dyna architecture?

In 1991, Sutton proposed the Dyna architecture, a framework that explicitly integrates learning from real experience, learning of a world model, and planning by simulated experience using that model, all within a single learning agent. Dyna is widely regarded as one of the earliest and most influential model-based reinforcement-learning architectures, and its ideas remain central to current model-based and world-model approaches in deep RL.^[3]^[4]

What is the options framework?

Together with Doina Precup and Satinder Singh, Sutton co-developed the options framework for temporal abstraction in reinforcement learning, published as "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning" in the journal Artificial Intelligence in 1999. The options framework treats sub-tasks as temporally extended "options" within a semi-Markov decision process, providing one of the foundational formalisms for hierarchical reinforcement learning.^[3]

What are policy gradient methods?

Sutton and his collaborators (David McAllester, Satinder Singh, and Yishay Mansour) authored "Policy Gradient Methods for Reinforcement Learning with Function Approximation" (Advances in Neural Information Processing Systems, 2000), which formally established the policy-gradient theorem for differentiable function approximators. The theorem provides a clean expression for the gradient of an agent's expected return with respect to the parameters of its policy, even when both the policy and the value function are represented by neural networks or other parameterized models. The result is the theoretical foundation for policy gradient algorithms such as REINFORCE, A2C/A3C, TRPO, and PPO that drive much of modern deep RL, including, most prominently, the reinforcement-learning-from-human-feedback (RLHF) and reinforcement-learning-from-AI-feedback (RLAIF) pipelines used to train and align large language models.^[3]

Other contributions

Sutton has also done influential work on gradient and emphatic TD methods (giving the first sound off-policy TD algorithms with linear function approximation), on the Horde architecture for parallel general-value-function learning (in which a single agent learns a large number of value-like predictions in parallel), and on options discovery and intra-option learning. More recently he has focused on continual and "on-the-job" learning: the problem of building agents that keep improving from new experience, without catastrophic forgetting and without a fresh round of expensive offline training, and on the related problem of "plasticity loss" in deep networks.^[12]

He has been a vocal critic of the view that pure scaling of supervised pretraining will be sufficient for general intelligence. In interviews and talks following his 2024 Turing Award, Sutton repeatedly argued that current large language models lack on-the-job learning and that fundamentally new architectures are required if AI systems are to keep learning after deployment. In 2025 he gave the keynote "The Era of Experience and The Age of Design" at the Upper Bound 2025 AI conference in Edmonton, in which he laid out the same case.^[16]

What is "Reinforcement Learning: An Introduction"?

Sutton and Barto's textbook "Reinforcement Learning: An Introduction" is the standard reference and teaching text for the field. The first edition was published by MIT Press in 1998 and the second, substantially expanded edition appeared in 2018. The two editions are commonly referred to among RL researchers simply as "Sutton and Barto" or "the RL bible."^[3]^[5]

The book systematically presents the formalisms of Markov decision processes, dynamic programming, Monte Carlo methods, TD learning, n-step methods, eligibility traces, policy-gradient methods, function approximation, and integrated planning. The second edition adds substantial new material on policy-gradient methods (reflecting the explosion of deep-RL work in the 2010s), off-policy methods, average-reward formulations, and case studies from games and robotics. It also includes a final part on the relationship between RL and topics such as psychology and neuroscience, reflecting Sutton's long-standing interest in those connections.^[5]

The textbook has been adopted in graduate AI and machine learning courses worldwide. As of 2025 it had been cited more than 80,000 times in Google Scholar, making it one of the most-cited works in machine learning. The free draft PDF of the second edition, which Sutton hosts on his personal website incompleteideas.net, has been downloaded by hundreds of thousands of students and practitioners.^[3]^[5]

What is The Bitter Lesson?

In March 2019 Sutton published a short essay on his personal website, incompleteideas.net, titled "The Bitter Lesson." The full essay is around 1,100 words.^[6] Its opening line states the thesis directly:

"The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin." - Richard S. Sutton, "The Bitter Lesson" (2019)^[6]

Approaches that bake in human knowledge, such as opening-book chess heuristics, hand-engineered vision features, and symbolic linguistic structure, repeatedly lose, in the long run, to simple methods (search and learning) that scale with available compute as Moore's law continues to compound.^[6]^[7]

Sutton illustrated the argument with examples from computer chess (where Deep Blue's brute-force search beat carefully engineered "human-style" chess programs), computer Go (where AlphaGo and AlphaGo Zero out-scaled all prior knowledge-based programs), speech recognition (where statistical and then neural approaches displaced hand-built phonetic systems), and computer vision (where convolutional networks supplanted hand-crafted features).^[6]^[7] The essay characterizes the conclusion as "bitter" because it is less anthropocentric than researchers had hoped, and it has remained influential through the rise of large-scale deep learning and large language models, frequently cited both in defense of scaling and as the target of subsequent critiques.^[7]

Did Richard Sutton win the Turing Award?

Yes. On March 5, 2025, the Association for Computing Machinery (ACM) announced that Andrew G. Barto and Richard S. Sutton would receive the 2024 ACM A.M. turing award "for developing the conceptual and algorithmic foundations of reinforcement learning."^[1]^[2] The Turing Award is widely described as the "Nobel Prize of Computing" and carries a US$1 million prize, with financial support provided by Google.^[1]^[17]

In its citation, ACM credited Barto and Sutton with having introduced the main ideas, constructed the mathematical foundations, and developed the most important algorithms for reinforcement learning over a series of papers beginning in the 1980s. The official ACM citation specifically highlights their textbook "Reinforcement Learning: An Introduction" (1998; 2nd ed. 2018) as the work that introduced these ideas to generations of researchers. The award page noted Sutton's positions as Professor of Computer Science at the University of Alberta, Research Scientist at Keen Technologies, and Fellow at the Alberta Machine Intelligence Institute (Amii).^[1]^[2]

The 2024 award marked the first time the Turing Award was given primarily for reinforcement learning, recognizing a body of work that, by the time of the announcement, had become foundational to a wide range of modern AI systems, from DeepMind's game-playing programs (alphago, alphazero, alphaproof) and robotic-control systems, to the reinforcement-learning-from-human-feedback (RLHF) procedures used to align large language models such as ChatGPT and Claude.^[17]^[11]

Reaction to the award from the AI research community was broadly enthusiastic. Coverage by ACM, the U.S. National Science Foundation, the University of Alberta, Amii, the Heidelberg Laureate Foundation, and major technology outlets emphasized both the long arc of Barto and Sutton's work and its present-day applications. The University of Alberta described it as the university's first Turing Award and a milestone for Canadian AI research.^[11]^[17]^[13]

In post-award interviews, Sutton stressed that the award was as much about reinforcement learning as a field as it was about him and Barto personally, and used the platform to argue for continued investment in fundamental, "experience-based" research, as opposed to ever-larger pre-training runs alone. His remarks were consistent with the broader argument of "The Bitter Lesson," that simple, scalable, search-and-learning-based methods are the durable winners, but added a new emphasis on continual learning from interaction with the world.^[13]^[16]

What are Richard Sutton's views on AI?

Sutton has been an outspoken voice on the future direction of AI research. His public position, repeated in essays, talks, and post-Turing-Award interviews, can be summarized in three parts.

First, he argues, following "The Bitter Lesson," that general methods based on search and learning, with compute as the main resource, are the durable winners over hand-engineered domain knowledge. He has repeatedly applied this argument to debates over symbolic AI, neuro-symbolic methods, and aggressive use of human feedback in language-model training.^[6]^[7]

Second, he has argued that contemporary large language models, while impressive, lack continual learning from interactive experience; they are essentially trained once and frozen, with at best brief fine-tuning. In the 2025 essay "Welcome to the Era of Experience," co-authored with David Silver and circulated as part of an upcoming MIT Press book "Designing an Intelligence," Sutton and Silver argued that the next leap in AI capability will come from agents that learn predominantly from their own experience interacting with the world, rather than from human-generated text. They cited DeepMind's AlphaProof, which used reinforcement learning to generate millions of new mathematical proofs and reached the level of an IMO silver medalist, as an early instance of this "era of experience."^[16]

Third, he is publicly skeptical of doomer-style framings of AI risk, while supporting careful long-term safety research. He has emphasized the importance of building AI systems whose objectives are shaped through interaction with the world rather than imposed from above, and has framed AGI as a scientific goal as well as an engineering one.^[10]^[16]

These views are central to the agenda of Keen Technologies, which Sutton and John Carmack describe as aimed at "real-time, on-device, on-the-job learning."^[10]

Awards and honors

2024 ACM A.M. Turing Award (announced March 5, 2025), shared with Andrew G. Barto.^[1]^[2]
Fellow of the Royal Society (FRS), London, 2021.^[3]
Fellow of the Royal Society of Canada (FRSC), 2016.^[3]
Outstanding Achievement in Research Award, University of Massachusetts Amherst, 2013.^[3]
President's Award, International Neural Network Society, 2003.^[3]
Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), 2001.^[3]
Canada CIFAR AI Chair.^[12]

Selected publications

Sutton, R. S. (1988). "Learning to predict by the methods of temporal differences." Machine Learning 3(1): 9-44.^[4]
Sutton, R. S. (1991). "Dyna, an integrated architecture for learning, planning, and reacting." ACM SIGART Bulletin 2(4): 160-163.^[3]
Sutton, R. S. & Barto, A. G. (1998). Reinforcement Learning: An Introduction (1st ed.). MIT Press.^[5]
Sutton, R. S., Precup, D., & Singh, S. (1999). "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning." Artificial Intelligence 112: 181-211.^[3]
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). "Policy gradient methods for reinforcement learning with function approximation." Advances in Neural Information Processing Systems 12.^[3]
Sutton, R. S. (2019). "The Bitter Lesson." Online essay, incompleteideas.net.^[6]
Sutton, R. S. & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press.^[5]
Silver, D. & Sutton, R. S. (2025). "Welcome to the Era of Experience." Google DeepMind / MIT Press preprint.^[16]

References

"Andrew Barto and Richard Sutton are the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning." Association for Computing Machinery. https://awards.acm.org/about/2024-turing ↩
"Dr. Richard Sutton, 2024 ACM A.M. Turing Award recipient." ACM Award Recipients. https://awards.acm.org/award-recipients/sutton_0160594 ↩
"Richard S. Sutton." Wikipedia. https://en.wikipedia.org/wiki/Richard_S._Sutton ↩
Sutton, R. S. (1988). "Learning to predict by the methods of temporal differences." Machine Learning 3(1): 9-44. ↩
Sutton, R. S. & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. https://mitpress.mit.edu/9780262039246/reinforcement-learning/ ↩
Sutton, R. S. (March 13, 2019). "The Bitter Lesson." incompleteideas.net. http://www.incompleteideas.net/IncIdeas/BitterLesson.html ↩
"Bitter lesson." Wikipedia. https://en.wikipedia.org/wiki/Bitter_lesson ↩
"Rich Sutton, PhD, Faculty Directory." University of Alberta. https://apps.ualberta.ca/directory/person/rsutton ↩
"DeepMind expands to Canada with new research office in Edmonton, Alberta." Google DeepMind blog, July 5, 2017. https://deepmind.google/blog/deepmind-expands-to-canada-with-new-research-office-in-edmonton-alberta/ ↩
"John Carmack and Rich Sutton partner to accelerate development of Artificial General Intelligence." GlobeNewswire, September 25, 2023. https://www.globenewswire.com/news-release/2023/09/25/2749057/0/en/John-Carmack-and-Rich-Sutton-partner-to-accelerate-development-of-Artificial-General-Intelligence.html ↩
"AI pioneers Andrew Barto and Richard Sutton win 2024 Turing Award for groundbreaking contributions to reinforcement learning." U.S. National Science Foundation, March 2025. https://www.nsf.gov/news/ai-pioneers-andrew-barto-richard-sutton-win-2024-turing ↩
"Richard S. Sutton." Alberta Machine Intelligence Institute (Amii). https://www.amii.ca/about/our-people/richard-s-sutton/ ↩
"Rich Sutton, A.M. Turing Award Winner: Understanding Intelligence." Amii. https://www.amii.ca/updates-insights/rich-sutton-turing ↩
"Alberta Machine Intelligence Institute." Wikipedia. https://en.wikipedia.org/wiki/Alberta_Machine_Intelligence_Institute ↩
"Rich Sutton joins John Carmack's Keen Technologies." Hacker News discussion, September 25, 2023. https://news.ycombinator.com/item?id=37651548 ↩
Silver, D. & Sutton, R. S. (April 2025). "Welcome to the Era of Experience." Google DeepMind. https://storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf ↩
"Andrew Barto and Richard Sutton win 2024 Turing Award." AIhub, March 6, 2025. https://aihub.org/2025/03/06/andrew-barto-and-richard-sutton-win-2024-turing-award/ ↩
"John Carmack's Keen Technologies Partners To Speed Development of Artificial General Intelligence." Dallas Innovates, September 25, 2023. https://dallasinnovates.com/john-carmacks-keen-technologies-partners-to-accelerate-development-of-artificial-general-intelligence/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

4 revisions by 1 contributor · full history

Suggest edit

What links here

Andrew Barto David Silver Ineffable Intelligence Temporal-difference learning The Bitter Lesson

Key facts

Who is Richard Sutton?

Early life and education

What did Richard Sutton do early in his career?

What is Richard Sutton's role at the University of Alberta?

When was Richard Sutton at DeepMind?

Why did Richard Sutton join Keen Technologies?

What are Richard Sutton's major research contributions?

What is temporal-difference learning?

What are actor-critic methods?

What is the Dyna architecture?

What is the options framework?

What are policy gradient methods?

Other contributions

What is "Reinforcement Learning: An Introduction"?

What is The Bitter Lesson?

Did Richard Sutton win the Turing Award?

What are Richard Sutton's views on AI?

Awards and honors

Selected publications

See also

References

Improve this article

Related Articles

Jeff Clune

Tülu 3

Noam Shazeer

Tri Dao

Percy Liang

Yejin Choi

What links here

Related Articles

Jeff Clune

Tülu 3

Noam Shazeer

Tri Dao

Percy Liang

Yejin Choi

What links here