Andrew Barto

Machine Learning People Reinforcement Learning

10 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

11 citations

Revision

v2 · 1,918 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Andrew Barto is an American computer scientist and one of the founders of modern reinforcement learning, the branch of machine learning in which an agent learns by trial and error from rewards. He is Professor Emeritus of Information and Computer Sciences at the University of Massachusetts Amherst, where he co-directed the Autonomous Learning Laboratory, and with his former doctoral student Richard Sutton he built much of the conceptual and mathematical core of the field, including temporal-difference learning and the actor-critic architecture, and co-wrote the standard textbook "Reinforcement Learning: An Introduction" (1998; 2nd ed. 2018). On March 5, 2025, the Association for Computing Machinery named Barto and Sutton recipients of the 2024 ACM A.M. Turing Award, the field's highest honor, "for developing the conceptual and algorithmic foundations of reinforcement learning." ^[1]^[2]

Who is Andrew Barto?

Andrew Gehret Barto was born in 1948. He attended the University of Michigan, where he first enrolled in naval architecture and engineering before switching fields. After reading work by the cyberneticists Michael Arbib, Warren McCulloch, and Walter Pitts, he grew interested in using mathematics and computers to model the brain. He received a B.S. with distinction in mathematics from Michigan in 1970. ^[3]^[4]

Barto stayed at Michigan for graduate study in computer science. He completed a Ph.D. in 1975 with a dissertation on cellular automata as models of natural systems, supervised by Bernard Zeigler. ^[3]^[4] His early training in mathematical modeling of biological and physical systems shaped the direction of his later research, which kept returning to the question of how learning works in both natural and artificial agents.

Where did Andrew Barto work?

Barto joined what is now the Manning College of Information and Computer Sciences at the University of Massachusetts Amherst in 1977 as a postdoctoral research associate. He was promoted to associate professor in 1982 and to full professor in 1991. He served as department chair from 2007 to 2011 and retired in 2012, taking emeritus status; the ACM describes his current title as Professor Emeritus of Information and Computer Sciences at the University of Massachusetts, Amherst. ^[2]^[3]^[4]

At UMass he co-founded and co-directed the Autonomous Learning Laboratory, which was earlier known as the Adaptive Network Laboratory. The lab became a base for research on trial-and-error learning in agents and produced a long line of doctoral graduates who went on to shape reinforcement learning and adjacent areas. ^[4]^[5] Barto's research program treated learning as a problem an agent solves by interacting with its environment over time, and it consistently drew on ideas from psychology, control theory, and neuroscience as well as computer science.

What is Andrew Barto known for?

Andrew Barto is known above all for laying the foundations of reinforcement learning together with Richard Sutton, including temporal-difference learning, the actor-critic architecture, and the discipline's standard textbook. The ACM credits the pair with introducing the main ideas, building the mathematical foundations, and developing important algorithms of the field across a series of papers that began in the 1980s. ^[1]^[2] The sections below describe the specific contributions.

Reinforcement learning as a framework

Reinforcement learning studies how an agent can learn to act by trying actions and observing rewards, rather than by being shown correct answers. Beginning in the late 1970s and through the 1980s, Barto and Sutton drew together strands from animal learning theory, optimal control, and artificial intelligence into a single computational framework. They formulated the problem in terms of an agent interacting with an environment modeled as a Markov decision process, where the dynamics and the rewards can be unknown to the agent. This formulation let the same family of algorithms apply to a wide range of decision problems. ^[1]^[2]^[6]

Neuron-like adaptive elements and the cart-pole problem

In a 1983 paper with Sutton and Charles Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," Barto introduced a system of two neuron-like units that learns to balance a pole hinged to a movable cart by pushing the cart left or right. One unit, the associative search element, learns which action to take in a given situation; the other, the adaptive critic element, learns to predict future reward and supplies a better internal evaluation signal than the raw reward alone. ^[7] This cart-pole task became a standard benchmark, and the two-unit design was an early version of what is now called the actor-critic architecture, in which a separate actor selects actions and a critic estimates value.

Temporal-difference learning

Temporal-difference learning is a method for estimating how good a state is by using the difference between successive predictions, so an agent can learn from each step without waiting for a final outcome. Barto and Sutton developed and analyzed these methods through the 1980s, and their 1981 work showed that temporal-difference ideas could account for learning effects that older models from animal psychology, such as the Rescorla-Wagner model, did not capture. ^[6]^[8] Temporal-difference learning sits at the center of many later algorithms, and it underlies a family of techniques that includes value-based methods and policy gradient methods, several of which Barto and Sutton helped to formalize. ^[1]

The textbook

In 1998 the MIT Press published "Reinforcement Learning: An Introduction" by Sutton and Barto. The book gathered the field's main concepts and algorithms into a single teaching text and became its standard reference. A substantially revised second edition appeared in 2018. By the time of the Turing Award, the ACM reported that the textbook had been cited more than 75,000 times and remained the standard reference in the field, widely used in graduate courses. ^[1]^[2]

Links to neuroscience

A recurring theme in Barto's work is the connection between reinforcement learning and the brain. In the 1990s, work building on temporal-difference learning helped explain the firing patterns of dopamine neurons, which appeared to signal a reward-prediction error much like the error term used in temporal-difference algorithms. ^[8] This correspondence between an engineering idea and a biological mechanism became influential in computational neuroscience, and Barto continued to study reward systems in the brain and biologically plausible learning rules after his retirement. ^[4]^[5]

Did Andrew Barto win the Turing Award?

Yes. On March 5, 2025, the ACM announced that Barto and Sutton would receive the 2024 ACM A.M. Turing Award, often described as the Nobel Prize of computing, "for developing the conceptual and algorithmic foundations of reinforcement learning." The award carries a one million dollar prize with financial support provided by Google, Inc. ^[1]^[2] The ACM citation credits the pair with introducing the main ideas, building the mathematical foundations, and developing important algorithms of the field across a series of papers that began in the 1980s. ^[1] At the time of the award Sutton was a professor at the University of Alberta, a research scientist at Keen Technologies, and a Fellow of the Alberta Machine Intelligence Institute (Amii). ^[2]

The announcement noted that reinforcement learning grew far more capable in the 2010s once it was combined with deep learning, and it pointed to the use of these techniques in systems such as the Go program AlphaGo and in the fine-tuning of large language models through reinforcement learning from human feedback. ^[1]^[2] ACM President Yannis Ioannidis said that "Barto and Sutton's work demonstrates the immense potential of applying a multidisciplinary approach to longstanding challenges in our field," and Google's Jeff Dean said that "reinforcement learning, as pioneered by Barto and Sutton, directly answers Turing's challenge" of building machines that learn. ^[1]^[2]

What has Barto said about AI safety?

In interviews after the announcement, Barto used the occasion to caution against releasing powerful AI systems without adequate safeguards. Speaking to the Financial Times, he warned that releasing AI models "without safeguards is not good engineering practice," comparing the practice to building a bridge and testing it by having people walk across it. ^[9]^[10] He also questioned business models that, in his view, give companies an incentive to deploy quickly rather than carefully. ^[9]

What other awards has Andrew Barto received?

Barto's earlier honors include the IEEE Neural Networks Society Pioneer Award in 2004, the IJCAI Award for Research Excellence in 2017, and a UMass Neurosciences Lifetime Achievement Award in 2019. He is a Fellow of the American Association for the Advancement of Science and a Fellow of the IEEE. ^[3]^[4]

Who were Andrew Barto's students?

Barto's best-known doctoral student is Richard Sutton, who earned his Ph.D. at UMass Amherst in 1984 and became his long-term collaborator and co-author. ^[3]^[11] Other students from the Autonomous Learning Laboratory, including Amy McGovern and Satinder Singh, went on to research careers in machine learning and artificial intelligence. ^[4] Through these graduates, the textbook, and the algorithms that carry his name, Barto's ideas spread across robotics, game playing, recommendation systems, and the training of modern AI models. ^[1]^[2]

The reach of reinforcement learning has prompted debate within the field about how far the approach can go. Sutton's later essay "The Bitter Lesson" argued that general methods which scale with computation tend to win out over hand-built knowledge, a view rooted in the kind of learning-from-experience research the two pursued together. Barto's own writing has stayed closer to the foundations, emphasizing the ties between machine learning and the study of behavior and the brain. ^[4]^[5]

Facts

Field	Detail
Full name	Andrew Gehret Barto
Born	1948
Nationality	American
Fields	Computer science, reinforcement learning, computational neuroscience
Education	University of Michigan (B.S. mathematics, 1970; Ph.D. computer science, 1975)
Doctoral advisor	Bernard Zeigler
Institution	University of Massachusetts Amherst (1977 to 2012; Professor Emeritus of Information and Computer Sciences)
Known for	Reinforcement learning, temporal-difference learning, actor-critic architecture, "Reinforcement Learning: An Introduction"
Notable student	Richard Sutton
Major award	2024 ACM A.M. Turing Award (announced March 5, 2025), shared with Richard Sutton; $1 million prize funded by Google
Other honors	IEEE Neural Networks Pioneer Award (2004); IJCAI Award for Research Excellence (2017); AAAS Fellow; IEEE Fellow

References

Association for Computing Machinery. "Andrew Barto and Richard Sutton are the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning." ACM Awards, 2025. https://awards.acm.org/about/2024-turing ↩
ACM. "ACM A.M. Turing Award Honors Two Researchers Who Led the Development of Cornerstone AI Technology." PR Newswire, March 5, 2025. https://www.prnewswire.com/news-releases/acm-am-turing-award-honors-two-researchers-who-led-the-development-of-cornerstone-ai-technology-302392341.html ↩
Wikipedia. "Andrew Barto." Wikipedia, accessed 2026. https://en.wikipedia.org/wiki/Andrew_Barto ↩
Manning College of Information and Computer Sciences. "Andrew G. Barto." UMass Amherst faculty directory, accessed 2026. https://www.cics.umass.edu/about/directory/andrew-g-barto ↩
Manning College of Information and Computer Sciences. "Andrew Barto Named Co-recipient of 'Nobel Prize of Computing' for Foundational Work on AI Technology." UMass Amherst, March 2025. https://www.cics.umass.edu/news/barto-2024-acm-turing-award ↩
National Science Foundation. "AI pioneers Andrew Barto and Richard Sutton win 2024 Turing Award for groundbreaking contributions to reinforcement learning." NSF News, 2025. https://www.nsf.gov/news/ai-pioneers-andrew-barto-richard-sutton-win-2024-turing ↩
Barto, A. G., Sutton, R. S., and Anderson, C. W. "Neuronlike adaptive elements that can solve difficult learning control problems." IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-13, no. 5, pp. 834-846, 1983. https://ieeexplore.ieee.org/document/6313077/ ↩
Sutton, R. S., and Barto, A. G. "Reinforcement Learning: An Introduction." 2nd ed., MIT Press, 2018. http://incompleteideas.net/book/the-book-2nd.html ↩
Financial Times via heise online. "Turing Award winners warn against current AI developments." heise online, March 2025. https://www.heise.de/en/news/Turing-Award-winners-warn-against-current-AI-developments-10306495.html ↩
"Turing Award Winners Sound Alarm on Hasty AI Deployment." Slashdot, March 5, 2025. https://slashdot.org/story/25/03/05/1330242/turing-award-winners-sound-alarm-on-hasty-ai-deployment ↩
Wikipedia. "Richard S. Sutton." Wikipedia, accessed 2026. https://en.wikipedia.org/wiki/Richard_S._Sutton ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

ACM A.M. Turing Award Control theory Temporal-difference learning

Who is Andrew Barto?

Where did Andrew Barto work?

What is Andrew Barto known for?

Reinforcement learning as a framework

Neuron-like adaptive elements and the cart-pole problem

Temporal-difference learning

The textbook

Links to neuroscience

Did Andrew Barto win the Turing Award?

What has Barto said about AI safety?

What other awards has Andrew Barto received?

Who were Andrew Barto's students?

Facts

References

Improve this article

Related Articles

Sergey Levine

Pieter Abbeel

John Schulman

David Silver

Richard S. Sutton

Misha Laskin

What links here

Related Articles

Sergey Levine

Pieter Abbeel

John Schulman

David Silver

Richard S. Sutton

Misha Laskin

What links here