Andrew Barto
Last reviewed
May 31, 2026
Sources
11 citations
Review status
Source-backed
Revision
v1 · 1,725 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
11 citations
Review status
Source-backed
Revision
v1 · 1,725 words
Add missing citations, update stale details, or suggest a clearer explanation.
Andrew Barto is an American computer scientist known as one of the founders of modern reinforcement learning. For most of his career he was a professor of computer science at the University of Massachusetts Amherst, where he co-directed the Autonomous Learning Laboratory and is now professor emeritus. With his doctoral student Richard Sutton he built much of the conceptual and mathematical core of the field, including temporal-difference learning and the actor-critic architecture, and the two co-wrote the standard textbook "Reinforcement Learning: An Introduction." In March 2025 the Association for Computing Machinery named Barto and Richard Sutton recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. [1][2]
Andrew Gehret Barto was born in 1948. He attended the University of Michigan, where he first enrolled in naval architecture and engineering before switching fields. After reading work by the cyberneticists Michael Arbib, Warren McCulloch, and Walter Pitts, he grew interested in using mathematics and computers to model the brain. He received a B.S. with distinction in mathematics from Michigan in 1970. [3][4]
Barto stayed at Michigan for graduate study in computer science. He completed a Ph.D. in 1975 with a dissertation on cellular automata as models of natural systems, supervised by Bernard Zeigler. [3][4] His early training in mathematical modeling of biological and physical systems shaped the direction of his later research, which kept returning to the question of how learning works in both natural and artificial agents.
Barto joined what is now the Manning College of Information and Computer Sciences at the University of Massachusetts Amherst in 1977 as a postdoctoral research associate. He was promoted to associate professor in 1982 and to full professor in 1991. He served as department chair from 2007 to 2011 and retired in 2012, taking emeritus status. [3][4]
At UMass he co-founded and co-directed the Autonomous Learning Laboratory, which was earlier known as the Adaptive Network Laboratory. The lab became a base for research on trial-and-error learning in agents and produced a long line of doctoral graduates who went on to shape reinforcement learning and adjacent areas. [4][5] Barto's research program treated learning as a problem an agent solves by interacting with its environment over time, and it consistently drew on ideas from psychology, control theory, and neuroscience as well as computer science.
Reinforcement learning studies how an agent can learn to act by trying actions and observing rewards, rather than by being shown correct answers. Beginning in the late 1970s and through the 1980s, Barto and Sutton drew together strands from animal learning theory, optimal control, and artificial intelligence into a single computational framework. They formulated the problem in terms of an agent interacting with an environment modeled as a Markov decision process, where the dynamics and the rewards can be unknown to the agent. This formulation let the same family of algorithms apply to a wide range of decision problems. [1][2][6]
In a 1983 paper with Sutton and Charles Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," Barto introduced a system of two neuron-like units that learns to balance a pole hinged to a movable cart by pushing the cart left or right. One unit, the associative search element, learns which action to take in a given situation; the other, the adaptive critic element, learns to predict future reward and supplies a better internal evaluation signal than the raw reward alone. [7] This cart-pole task became a standard benchmark, and the two-unit design was an early version of what is now called the actor-critic architecture, in which a separate actor selects actions and a critic estimates value.
Temporal-difference learning is a method for estimating how good a state is by using the difference between successive predictions, so an agent can learn from each step without waiting for a final outcome. Barto and Sutton developed and analyzed these methods through the 1980s, and their 1981 work showed that temporal-difference ideas could account for learning effects that older models from animal psychology, such as the Rescorla-Wagner model, did not capture. [6][8] Temporal-difference learning sits at the center of many later algorithms, and it underlies a family of techniques that includes value-based methods and policy gradient methods, several of which Barto and Sutton helped to formalize. [1]
In 1998 the MIT Press published "Reinforcement Learning: An Introduction" by Sutton and Barto. The book gathered the field's main concepts and algorithms into a single teaching text and became its standard reference. A substantially revised second edition appeared in 2018. By the time of the Turing Award the book had been cited tens of thousands of times and was widely used in graduate courses. [1][2]
A recurring theme in Barto's work is the connection between reinforcement learning and the brain. In the 1990s, work building on temporal-difference learning helped explain the firing patterns of dopamine neurons, which appeared to signal a reward-prediction error much like the error term used in temporal-difference algorithms. [8] This correspondence between an engineering idea and a biological mechanism became influential in computational neuroscience, and Barto continued to study reward systems in the brain and biologically plausible learning rules after his retirement. [4][5]
On March 5, 2025, the ACM announced that Barto and Sutton would receive the 2024 ACM A.M. Turing Award, often described as the Nobel Prize of computing, for developing the conceptual and algorithmic foundations of reinforcement learning. The award carries a one million dollar prize funded by Google. [1][2] The ACM citation credits the pair with introducing the main ideas, building the mathematical foundations, and developing important algorithms of the field across a series of papers that began in the 1980s. [1]
The announcement noted that reinforcement learning grew far more capable in the 2010s once it was combined with deep learning, and it pointed to the use of these techniques in systems such as the Go program AlphaGo and in the fine-tuning of large language models through reinforcement learning from human feedback. [1][2] ACM President Yannis Ioannidis said the work shows the value of a multidisciplinary approach to longstanding problems in computing, and Google's Jeff Dean tied the contribution to Alan Turing's own challenge of building machines that learn. [1]
In interviews after the announcement, Barto used the occasion to caution against releasing powerful AI systems without adequate safeguards. He said that putting software in front of millions of people without testing for harm is not good engineering practice, and he compared it to building a bridge and checking its strength by having people walk across it. [9][10] He also questioned business models that, in his view, give companies an incentive to deploy quickly rather than carefully. [9]
Barto's earlier honors include the IEEE Neural Networks Society Pioneer Award in 2004, the IJCAI Award for Research Excellence in 2017, and a UMass Neurosciences Lifetime Achievement Award in 2019. He is a Fellow of the American Association for the Advancement of Science and a Fellow of the IEEE. [3][4]
Barto's best-known doctoral student is Richard Sutton, who earned his Ph.D. at UMass Amherst in 1984 and became his long-term collaborator and co-author. [3][11] Other students from the Autonomous Learning Laboratory, including Amy McGovern and Satinder Singh, went on to research careers in machine learning and artificial intelligence. [4] Through these graduates, the textbook, and the algorithms that carry his name, Barto's ideas spread across robotics, game playing, recommendation systems, and the training of modern AI models. [1][2]
The reach of reinforcement learning has prompted debate within the field about how far the approach can go. Sutton's later essay "The Bitter Lesson" argued that general methods which scale with computation tend to win out over hand-built knowledge, a view rooted in the kind of learning-from-experience research the two pursued together. Barto's own writing has stayed closer to the foundations, emphasizing the ties between machine learning and the study of behavior and the brain. [4][5]
| Field | Detail |
|---|---|
| Full name | Andrew Gehret Barto |
| Born | 1948 |
| Nationality | American |
| Fields | Computer science, reinforcement learning, computational neuroscience |
| Education | University of Michigan (B.S. mathematics, 1970; Ph.D. computer science, 1975) |
| Doctoral advisor | Bernard Zeigler |
| Institution | University of Massachusetts Amherst (1977 to 2012; professor emeritus) |
| Known for | Reinforcement learning, temporal-difference learning, actor-critic architecture, "Reinforcement Learning: An Introduction" |
| Notable student | Richard Sutton |
| Major award | 2024 ACM A.M. Turing Award (announced March 2025), shared with Richard Sutton |
| Other honors | IEEE Neural Networks Pioneer Award (2004); IJCAI Award for Research Excellence (2017); AAAS Fellow; IEEE Fellow |