Sergey Levine
Last reviewed
Sources
20 citations
Review status
Source-backed
Revision
v2 · 4,127 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
20 citations
Review status
Source-backed
Revision
v2 · 4,127 words
Add missing citations, update stale details, or suggest a clearer explanation.
Sergey Levine is an American computer scientist, associate professor of electrical engineering and computer sciences at the University of California, Berkeley, and a co-founder of Physical Intelligence, a San Francisco startup building general-purpose foundation models for robotics. He is one of the most cited researchers in deep reinforcement learning and robot learning: his Google Scholar profile lists more than 255,000 total citations, an h-index of 204, and an i10-index of 610 as of mid-2026 [4]. Levine directs Berkeley's Robotic AI and Learning (RAIL) Lab and is widely associated with the modern push to apply large neural networks to real-world robot control, from reinforcement learning algorithms such as soft actor-critic to the pi-zero family of vision-language-action models.
Levine joined Berkeley's Department of Electrical Engineering and Computer Sciences in fall 2016 after a PhD at Stanford University and a postdoc at Berkeley [1][2]. He is a member of Berkeley AI Research (BAIR), and his graduate course CS285 (Deep Reinforcement Learning) is a standard reference for the field through its publicly available video lectures [18]. His group has produced influential algorithms in policy search, off-policy reinforcement learning, offline reinforcement learning, meta-learning, and imitation learning.
In March 2024 Levine helped found Physical Intelligence with Karol Hausman, Chelsea Finn, Brian Ichter, Lachy Groom, Adnan Esmail, and Quan Vuong [19]. The company released the pi-0 model in October 2024, an open-weights vision-language-action model for general robot control [8][10], and raised a $400 million Series A in November 2024 at a $2.4 billion post-money valuation in a round led by Jeff Bezos and Thrive Capital [5][6]. By November 2025 the company had raised a further $600 million at a $5.6 billion valuation in a round led by Alphabet's growth fund CapitalG [7].
| Full name | Sergey Vladimir Levine |
| Citizenship | United States |
| Education | Stanford University (BS, MS, 2009; PhD, 2014) |
| Doctoral advisor | Vladlen Koltun |
| Doctoral thesis | Motor Skill Learning with Local Trajectory Methods (2014) |
| Postdoctoral advisor | Pieter Abbeel (UC Berkeley) |
| Known for | Guided policy search; soft actor-critic; conservative Q-learning; QT-Opt; pi-0 / pi-0.5; CS285 Deep RL course |
| Fields | Machine learning, reinforcement learning, robotics, computer vision |
| Citations | 255,000+ (h-index 204, i10-index 610), Google Scholar, mid-2026 |
| Institutions | UC Berkeley (2016 to present); Physical Intelligence (2024 to present); Google Brain (research scientist, 2015 to 2016) |
| Lab | Robotic AI and Learning (RAIL) Lab; Berkeley AI Research (BAIR) |
| Notable students | Chelsea Finn, Aviral Kumar, Tuomas Haarnoja, Karol Hausman, Abhishek Gupta, Karl Pertsch |
| Notable awards | Sloan Research Fellowship (2019); Presidential Early Career Award for Scientists and Engineers (2024) |
| Website | people.eecs.berkeley.edu/~svlevine |
Levine was raised in the United States and developed an early interest in computer graphics and animation, which led him to combine work on character animation with machine learning during his undergraduate years at Stanford University. He received both a Bachelor of Science and a Master of Science in computer science from Stanford in 2009.
He stayed at Stanford for doctoral work in the same department, joining the research group of Vladlen Koltun, then a faculty member in the Stanford computer science and graphics communities. His dissertation, titled "Motor Skill Learning with Local Trajectory Methods," was completed in 2014 [3]. The thesis introduced and developed the family of guided policy search algorithms, which decompose policy learning by alternating between trajectory optimization (used to handle exploration and to provide good local solutions) and supervised policy learning that distills the trajectories into a parametric policy. The framework allowed deep neural network policies to be trained with relatively low sample complexity by exploiting model knowledge or trajectory optimizers as teachers, an approach that proved influential in subsequent robot learning research.
Following his PhD, Levine moved to Berkeley as a postdoctoral researcher in the lab of Pieter Abbeel, who at that time was a leading figure in apprenticeship learning and deep reinforcement learning for robotics. During the postdoc he extended guided policy search to vision-based control of real robotic arms, producing the well known 2016 paper "End-to-End Training of Deep Visuomotor Policies," which showed that convolutional neural network policies could be trained directly from camera pixels and joint encoders to perform contact-rich manipulation tasks like screwing on a bottle cap or hanging a clothes hanger [3].
For approximately one year before joining the Berkeley faculty, Levine was a research scientist on the Google Brain team in Mountain View, where he continued work on large-scale robot learning and contributed to the early Brain robotics research that eventually fed into Google's broader robotics learning agenda.
Levine joined the Department of Electrical Engineering and Computer Sciences at UC Berkeley as an assistant professor in fall 2016 and was later promoted to associate professor with tenure [1]. He is a member of Berkeley AI Research (BAIR) and the director of the Robotic AI and Learning (RAIL) Lab, which has been one of the most prolific groups in deep reinforcement learning and robot learning of the past decade.
The RAIL group's research agenda spans algorithms for online and offline reinforcement learning, learning from demonstrations, large-scale robot data collection, foundation models for control, and the use of large pretrained models from vision and language as priors for embodied agents. Many of the methods that became standard tools in the deep RL community, including soft actor-critic, conservative Q-learning, model agnostic meta-learning, and the modern formulation of behavior cloning baselines for offline RL, have roots in the RAIL Lab and its collaborators.
Levine is widely known in the field for the publicly available course materials of CS285 (formerly CS294-112), Berkeley's graduate course on deep reinforcement learning. Lecture videos for the course are released each year on YouTube, and the slides and homework assignments are posted on the course website at rail.eecs.berkeley.edu/deeprlcourse [18]. The course covers policy gradients, value-based methods, actor-critic algorithms, model-based RL, exploration, inverse RL, meta-learning, and offline reinforcement learning, and is widely used by graduate students and self-learners outside Berkeley as a primary introduction to the field.
His administrative and service roles include program committee work and area chair positions for NeurIPS, ICML, ICLR, and the Conference on Robot Learning (CoRL), which he helped shape during its early years as one of the venues most closely associated with the deep robot learning community.
Levine is among the most cited researchers in artificial intelligence. As of mid-2026 his Google Scholar profile reports 255,563 total citations, an h-index of 204, and an i10-index of 610 [4]. His three most-cited works are "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks" (MAML, 2017) with roughly 19,800 citations, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor" (2018) with roughly 16,200 citations, and "Trust Region Policy Optimization" (2015) with roughly 12,000 citations [4]. An h-index of 204 means that at least 204 of his papers have each been cited 204 or more times.
Before joining Berkeley, Levine spent approximately a year as a full-time research scientist on the Google Brain team. After moving to Berkeley, he continued as a part-time visiting researcher and external collaborator with Google Brain (and later Google DeepMind) for several years, contributing to large-scale robot learning efforts including arm farms (the Google robot manipulation collective for self-supervised grasping data collection) and the RT (Robotics Transformer) line of work that built up to RT-1 and RT-2 [14][15].
In March 2024, Levine co-founded Physical Intelligence (often abbreviated PI or written as the Greek letter pi) in San Francisco [19]. The founding team included Karol Hausman (CEO, formerly staff research scientist at Google DeepMind and adjunct faculty at Stanford), Chelsea Finn (Stanford associate professor and former Levine student), Brian Ichter (formerly Google Brain robotics), Adnan Esmail, Quan Vuong (formerly Google DeepMind), and Lachy Groom (former Stripe executive turned investor, who serves as president). Levine's role at the company is research-focused, often described as co-founder and a leader of model design.
The company states: "Our mission at Physical Intelligence is to develop foundation models that can control any robot to perform any task" [10]. Its first publicly released model, pi-0 ("pi-zero"), was announced in October 2024 [8][10].
Physical Intelligence raised an early seed round of approximately $70 million led by Thrive Capital in early 2024, then closed a $400 million Series A in November 2024 at a $2.4 billion post-money valuation [5][6]. The Series A was led by Jeff Bezos (through his personal investment vehicle) and Thrive Capital, with participation from Lux Capital, Bond, Redpoint Ventures, Sequoia Capital, and OpenAI [6]. CNBC and PitchBook reported the round as one of the largest early-stage AI rounds of 2024 outside of pure language model labs [5][6].
In November 2025, Physical Intelligence raised an additional $600 million at a $5.6 billion valuation in a round led by Alphabet's independent growth fund CapitalG, with participation from existing backers including Thrive Capital, Lux Capital, and Jeff Bezos, plus new investors Index Ventures and T. Rowe Price [7]. The company's total funding reached roughly $1.1 billion. Levine has continued to publish under the Berkeley affiliation while leading model design work at PI.
Levine's research spans algorithms, systems, and applications of machine learning to decision making and control. The list below covers his most influential lines of work.
The earliest body of work for which Levine is well known is guided policy search (GPS), introduced in his 2013 ICML paper with Vladlen Koltun and developed extensively in his Stanford thesis [3]. GPS reframes policy search as a constrained optimization that alternates between (a) using trajectory optimization or model-based RL to produce locally optimal trajectory distributions and (b) supervised learning to fit a global parametric policy to those trajectories. The method enabled training deep neural network policies with much lower sample complexity than naive policy gradient methods.
The most cited extension of GPS is the 2016 paper "End-to-End Training of Deep Visuomotor Policies," which trained convolutional neural network policies on physical PR2 robots to perform manipulation tasks directly from camera pixels and joint state [3]. This work is widely credited as one of the first demonstrations that deep convolutional policies could be trained end-to-end for real-world robot control.
With his PhD student Tuomas Haarnoja and others, Levine introduced soft actor-critic (SAC) in 2018 [11]. SAC is an off-policy deep reinforcement learning algorithm based on the maximum entropy RL framework, in which, in the authors' words, "the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible" [11]. The method combined sample efficiency, stable convergence, and strong performance on continuous control benchmarks, and rapidly became the default off-policy continuous-action algorithm in academic deep RL. With roughly 16,200 citations [4], SAC remains, alongside proximal policy optimization, one of the two most widely used baseline RL algorithms for continuous control problems.
With Aviral Kumar, Aurick Zhou, and George Tucker, Levine introduced Conservative Q-Learning (CQL) in 2020 [12]. CQL addresses the central pathology of offline reinforcement learning (also called batch RL): when an RL agent learns from a fixed dataset, the standard Bellman update systematically overestimates the values of actions that the dataset does not cover, leading to disastrous policies at deployment. CQL adds a regularizer to the standard Q-function update that pushes Q-values down on out-of-distribution actions, producing a conservative lower bound on the true policy value. The method became one of the most widely used baselines and reference algorithms in the offline RL literature, alongside the closely related implicit Q-learning (IQL) and behavior-regularized methods that the Levine group also helped develop.
Levine and his students also co-authored the widely read survey "Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems" (2020), which became a standard entry point to the subfield. They became among the most prominent advocates for offline RL as a practical paradigm for robot learning, on the argument that pretraining on previously collected experience and then fine-tuning online is far more sample-efficient than learning everything from scratch in the real world.
With Chelsea Finn (then his PhD student) and Pieter Abbeel, Levine co-authored the 2017 paper "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks" (MAML) [13], one of the most cited papers in modern meta-learning with roughly 19,800 citations and his single most-cited work [4]. MAML proposed an optimization-based meta-learning algorithm that learns an initialization for a neural network such that a few gradient steps on a new task produce strong performance, with no architectural assumptions about the model. MAML has been widely adopted in few-shot classification, reinforcement learning, and robot learning.
Levine's group has also been a major contributor to modern imitation learning, including goal-conditioned behavior cloning (GCBC), inverse reinforcement learning, and large-scale demonstration-based robot pretraining. Many of these techniques later fed into the design of vision-language-action models such as RT-1, RT-2, and pi-0.
In 2018 Levine and collaborators at Google introduced QT-Opt, a scalable, distributed Q-learning system for vision-based robotic grasping. QT-Opt was trained on more than 580,000 real grasp attempts collected across a fleet of robotic arms and reached a 96 percent grasp success rate on previously unseen objects, a result frequently cited as evidence that off-policy RL could scale to real-world, vision-based manipulation. The work demonstrated that combining large-scale real-robot data collection with off-policy reinforcement learning could produce closed-loop grasping policies that generalize to novel objects.
With collaborators at Google DeepMind, Levine co-authored RT-1 ("Robotics Transformer for Real-World Control at Scale") in 2022 [14] and RT-2 ("Vision-Language-Action Models Transfer Web Knowledge to Robotic Control") in 2023 [15]. RT-1 demonstrated that a transformer trained on a large multi-task robot dataset could control real robotic arms across hundreds of tasks. RT-2 went further, fine-tuning a pretrained vision-language model so that it could output discretized action tokens, allowing it to inherit semantic generalization from internet-scale data.
At Physical Intelligence, this line of work culminated in the pi-0 model released in October 2024 and described in the paper "pi-0: A Vision-Language-Action Flow Model for General Robot Control" (arXiv:2410.24164) [8]. The paper describes "a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge" [8]. pi-0 was trained on data from many different robot platforms, including single-arm robots, dual-arm robots, and mobile manipulators, and was demonstrated on long-horizon manipulation tasks the paper lists as "laundry folding, table cleaning, and assembling boxes" [8]. The follow-up pi-0.5 model, released in 2025 and described in arXiv:2504.16054 ("pi-0.5: a Vision-Language-Action Model with Open-World Generalization") [9], extended pi-0 with co-training on heterogeneous task data, including web data and high-level semantic prediction, in order to obtain better open-world generalization.
Levine has been a vocal advocate of the broader thesis that robotics needs its own foundation models, trained on broad and diverse data and then specialized to individual platforms. His group at Berkeley participated in the Open X-Embodiment effort, a multi-institution dataset and model release that aggregated robot demonstration data from more than twenty academic and industry labs to train general cross-embodiment policies. The work demonstrated that policies trained on combined cross-platform data outperformed policies trained on data from any single platform, providing empirical support for the foundation-model paradigm in robot learning.
| Year | Title | Venue | Approx. citations |
|---|---|---|---|
| 2013 | Guided Policy Search | ICML | 1,400+ |
| 2015 | Trust Region Policy Optimization (co-author with J. Schulman et al.) | ICML | 12,000+ |
| 2016 | End-to-End Training of Deep Visuomotor Policies | JMLR | 4,500+ |
| 2017 | Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (with C. Finn, P. Abbeel) | ICML | 19,800+ |
| 2018 | Soft Actor-Critic: Off-Policy Maximum Entropy Deep RL with a Stochastic Actor (with T. Haarnoja et al.) | ICML | 16,200+ |
| 2018 | Soft Actor-Critic Algorithms and Applications | arXiv | 4,500+ |
| 2018 | QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation | CoRL | 1,300+ |
| 2020 | Conservative Q-Learning for Offline Reinforcement Learning (with A. Kumar et al.) | NeurIPS | 2,800+ |
| 2020 | Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems | arXiv | 2,000+ |
| 2021 | Implicit Behavioral Cloning | CoRL | 700+ |
| 2022 | RT-1: Robotics Transformer for Real-World Control at Scale | RSS | 1,000+ |
| 2023 | RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | CoRL | 800+ |
| 2023 | Open X-Embodiment: Robotic Learning Datasets and RT-X Models (multi-institution) | ICRA | 600+ |
| 2024 | pi-0: A Vision-Language-Action Flow Model for General Robot Control | Physical Intelligence preprint (arXiv:2410.24164) | 250+ |
| 2025 | pi-0.5: a Vision-Language-Action Model with Open-World Generalization | arXiv:2504.16054 | growing |
Citation counts are approximate and based on Google Scholar and Semantic Scholar as of mid-2026 [4]; values change over time. The MAML, SAC, and TRPO figures are taken directly from the per-paper counts on Levine's Google Scholar profile [4].
Levine teaches CS285 (formerly CS294-112), "Deep Reinforcement Learning," at Berkeley [18]. The course is one of the most widely watched graduate machine learning courses outside the classroom: each year's lecture videos are uploaded to YouTube, and the course materials are open-licensed on the official course website. The syllabus covers:
| Module | Topics |
|---|---|
| Foundations | Markov decision processes, value functions, policy evaluation |
| Policy gradients | REINFORCE, actor-critic, baselines, natural and trust region methods |
| Q-learning | DQN, double DQN, distributional RL |
| Actor-critic | DDPG, TD3, soft actor-critic |
| Model-based RL | Dynamics modeling, MPC, latent-state world models |
| Exploration | Intrinsic motivation, count-based exploration |
| Inverse RL and imitation | Behavioral cloning, GAIL, MaxEnt IRL |
| Offline RL | BCQ, CQL, IQL, behavior-regularized methods |
| Meta-RL and multi-task RL | MAML, PEARL |
| Foundation models for control | Vision-language-action models, scaling laws |
In addition to CS285, Levine has co-taught and lectured in tutorials at NeurIPS, ICML, and CoRL on topics including offline RL, learned models for control, and foundation models for robotics.
The RAIL Lab at Berkeley has produced a large number of researchers who hold faculty positions or senior industry research roles. The table below covers a representative sample (jointly advised students are listed here when Levine was the primary or co-primary advisor).
| Person | Role at graduation | Current position |
|---|---|---|
| Chelsea Finn | PhD, UC Berkeley (jointly with Pieter Abbeel), 2018 | Associate professor, Stanford; co-founder, Physical Intelligence |
| Tuomas Haarnoja | PhD, UC Berkeley, 2018 | Senior research scientist, Google DeepMind |
| Aviral Kumar | PhD, UC Berkeley, 2023 | Assistant professor, Carnegie Mellon University; research scientist at Google DeepMind |
| Karol Hausman | Postdoc / collaborator (PhD from USC, 2018) | CEO and co-founder, Physical Intelligence |
| Abhishek Gupta | PhD, UC Berkeley, 2021 | Assistant professor, University of Washington |
| Karl Pertsch | Postdoc, UC Berkeley | Research scientist, Physical Intelligence |
| Justin Fu | PhD, UC Berkeley | Research scientist, Google DeepMind |
| Coline Devin | PhD, UC Berkeley | Research scientist, Google DeepMind |
| Marvin Zhang | PhD, UC Berkeley | Industry research |
| Anusha Nagabandi | PhD, UC Berkeley | Co-founder, Covariant AI |
| Vitchyr Pong | PhD, UC Berkeley | Industry research |
| Kelvin Xu | PhD, UC Berkeley | Industry research |
The RAIL group also collaborates extensively with Pieter Abbeel's BAIR group on shared infrastructure and joint students, and historically with Trevor Darrell, Ken Goldberg, and other Berkeley robotics and computer vision faculty.
What awards has Sergey Levine received? His honors span early-career and mid-career recognition in machine learning and robotics, capped by the 2024 Presidential Early Career Award for Scientists and Engineers (PECASE), described by the U.S. government as the highest honor it bestows on scientists and engineers in the early stages of their careers [16].
| Year | Award |
|---|---|
| 2014 | Stanford Computer Science PhD (dissertation "Motor Skill Learning with Local Trajectory Methods") |
| 2016 | NSF CAREER Award |
| 2018 | Okawa Foundation Research Grant |
| 2019 | Sloan Research Fellowship |
| 2024 | Presidential Early Career Award for Scientists and Engineers (PECASE) |
Levine has also been recognized through best paper and best paper finalist nominations at venues including ICRA, ICLR, NeurIPS, and CoRL, as well as on community lists of top-cited or most-influential authors in artificial intelligence.
The following bibliography lists representative works that span the major lines of research in Levine's career. They are referenced inline in the article above by year and short title.