Sergey Levine
Last reviewed
May 4, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 · 3,633 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 4, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 · 3,633 words
Add missing citations, update stale details, or suggest a clearer explanation.
Sergey Levine is an American computer scientist, associate professor of electrical engineering and computer sciences at the University of California, Berkeley, and a co-founder of Physical Intelligence, a San Francisco startup developing general-purpose foundation models for robotics. He is one of the most cited researchers in deep reinforcement learning and is widely associated with the modern push to apply large neural networks to real-world robot control.
Levine joined Berkeley's Department of Electrical Engineering and Computer Sciences in fall 2016 after a PhD at Stanford University and a postdoc at Berkeley. He directs the Robotic AI and Learning (RAIL) Lab and is a member of Berkeley AI Research (BAIR). His group has produced influential algorithms in policy search, off-policy reinforcement learning, offline reinforcement learning, meta-learning, and imitation learning, and his graduate course CS285 (Deep Reinforcement Learning) is a standard reference for the field through its publicly available video lectures.
In March 2024 Levine helped found Physical Intelligence with Karol Hausman, Chelsea Finn, Brian Ichter, Lachy Groom, Adnan Esmail, and Quan Vuong. The company released the π0 model in October 2024, an open-weights vision-language-action model for general robot control, and raised $400 million in November 2024 at a $2.4 billion post-money valuation in a round led by Jeff Bezos and Thrive Capital. By late 2025 the company was reported to be valued at $5.6 billion.
| Full name | Sergey Vladimir Levine |
| Citizenship | United States |
| Education | Stanford University (BS, MS, 2009; PhD, 2014) |
| Doctoral advisor | Vladlen Koltun |
| Doctoral thesis | Motor Skill Learning with Local Trajectory Methods (2014) |
| Postdoctoral advisor | Pieter Abbeel (UC Berkeley) |
| Known for | Guided policy search; soft actor-critic; conservative Q-learning; π0 / π0.5; CS285 Deep RL course |
| Fields | Machine learning, reinforcement learning, robotics, computer vision |
| Institutions | UC Berkeley (2016 to present); Physical Intelligence (2024 to present); Google Brain (research scientist, 2015 to 2016) |
| Lab | Robotic AI and Learning (RAIL) Lab; Berkeley AI Research (BAIR) |
| Notable students | Chelsea Finn, Aviral Kumar, Tuomas Haarnoja, Karol Hausman, Abhishek Gupta, Karl Pertsch |
| Notable awards | Sloan Research Fellowship (2019); Presidential Early Career Award for Scientists and Engineers (2024) |
| Website | people.eecs.berkeley.edu/~svlevine |
Levine was raised in the United States and developed an early interest in computer graphics and animation, which led him to combine work on character animation with machine learning during his undergraduate years at Stanford University. He received both a Bachelor of Science and a Master of Science in computer science from Stanford in 2009.
He stayed at Stanford for doctoral work in the same department, joining the research group of Vladlen Koltun, then a faculty member in the Stanford computer science and graphics communities. His dissertation, titled "Motor Skill Learning with Local Trajectory Methods," was completed in 2014. The thesis introduced and developed the family of guided policy search algorithms, which decompose policy learning by alternating between trajectory optimization (used to handle exploration and to provide good local solutions) and supervised policy learning that distills the trajectories into a parametric policy. The framework allowed deep neural network policies to be trained with relatively low sample complexity by exploiting model knowledge or trajectory optimizers as teachers, an approach that proved influential in subsequent robot learning research.
Following his PhD, Levine moved to Berkeley as a postdoctoral researcher in the lab of Pieter Abbeel, who at that time was a leading figure in apprenticeship learning and deep reinforcement learning for robotics. During the postdoc he extended guided policy search to vision-based control of real robotic arms, producing the well known 2016 paper "End-to-End Training of Deep Visuomotor Policies," which showed that convolutional neural network policies could be trained directly from camera pixels and joint encoders to perform contact-rich manipulation tasks like screwing on a bottle cap or hanging a clothes hanger.
For approximately one year before joining the Berkeley faculty, Levine was a research scientist on the Google Brain team in Mountain View, where he continued work on large-scale robot learning and contributed to the early Brain robotics research that eventually fed into Google's broader robotics learning agenda.
Levine joined the Department of Electrical Engineering and Computer Sciences at UC Berkeley as an assistant professor in fall 2016 and was later promoted to associate professor with tenure. He is a member of Berkeley AI Research (BAIR) and the director of the Robotic AI and Learning (RAIL) Lab, which has been one of the most prolific groups in deep reinforcement learning and robot learning of the past decade.
The RAIL group's research agenda spans algorithms for online and offline reinforcement learning, learning from demonstrations, large-scale robot data collection, foundation models for control, and the use of large pretrained models from vision and language as priors for embodied agents. Many of the methods that became standard tools in the deep RL community, including soft actor-critic, conservative Q-learning, model agnostic meta-learning, and the modern formulation of behavior cloning baselines for offline RL, have roots in the RAIL Lab and its collaborators.
Levine is widely known in the field for the publicly available course materials of CS285 (formerly CS294-112), Berkeley's graduate course on deep reinforcement learning. Lecture videos for the course are released each year on YouTube, and the slides and homework assignments are posted on the course website at rail.eecs.berkeley.edu/deeprlcourse. The course covers policy gradients, value-based methods, actor-critic algorithms, model-based RL, exploration, inverse RL, meta-learning, and offline reinforcement learning, and is widely used by graduate students and self-learners outside Berkeley as a primary introduction to the field.
His administrative and service roles include program committee work and area chair positions for NeurIPS, ICML, ICLR, and the Conference on Robot Learning (CoRL), which he helped shape during its early years as one of the venues most closely associated with the deep robot learning community.
Before joining Berkeley, Levine spent approximately a year as a full-time research scientist on the Google Brain team. After moving to Berkeley, he continued as a part-time visiting researcher and external collaborator with Google Brain (and later Google DeepMind) for several years, contributing to large-scale robot learning efforts including arm farms (the Google robot manipulation collective for self-supervised grasping data collection) and the RT (Robotics Transformer) line of work that built up to RT-1 and RT-2.
In March 2024, Levine co-founded Physical Intelligence (often abbreviated PI or written as π) in San Francisco. The founding team included Karol Hausman (CEO, formerly staff research scientist at Google DeepMind and adjunct faculty at Stanford), Chelsea Finn (Stanford assistant professor and former Levine student), Brian Ichter (formerly Google Brain robotics), Adnan Esmail, Quan Vuong (formerly Google DeepMind), and Lachy Groom (former Stripe executive turned investor, who serves as president).
The company's mission is to build a single foundation model that can drive any robot to perform any physical task in the world. Its first publicly released model, π0 ("pi-zero"), was announced in October 2024. PI raised an early seed round of approximately $70 million at a roughly $400 million valuation in early 2024, then closed a $400 million Series A in November 2024 at a $2.4 billion post-money valuation. The Series A was led by Jeff Bezos (through his personal investment vehicle) and Thrive Capital, with participation from Lux Capital, Bond, Redpoint Ventures, Sequoia Capital, and OpenAI. Reuters and CNBC reported the round as one of the largest early-stage AI rounds of 2024 outside of pure language model labs.
In November 2025, Bloomberg reported that Physical Intelligence had raised additional capital at a $5.6 billion valuation, and a follow-on raise later in 2025 reportedly pushed the company's valuation higher still. Levine's role at the company is research-focused; he has continued to publish under the Berkeley affiliation while leading model design work at PI.
Levine's research spans algorithms, systems, and applications of machine learning to decision making and control. The list below covers his most influential lines of work.
The earliest body of work for which Levine is well known is guided policy search (GPS), introduced in his 2013 ICML paper with Vladlen Koltun and developed extensively in his Stanford thesis. GPS reframes policy search as a constrained optimization that alternates between (a) using trajectory optimization or model-based RL to produce locally optimal trajectory distributions and (b) supervised learning to fit a global parametric policy to those trajectories. The method enabled training deep neural network policies with much lower sample complexity than naïve policy gradient methods.
The most cited extension of GPS is the 2016 paper "End-to-End Training of Deep Visuomotor Policies," which trained convolutional neural network policies on physical PR2 robots to perform manipulation tasks directly from camera pixels and joint state. This work is widely credited as one of the first demonstrations that deep convolutional policies could be trained end-to-end for real-world robot control.
With his PhD student Tuomas Haarnoja and others, Levine introduced soft actor-critic (SAC) in 2018. SAC is an off-policy deep reinforcement learning algorithm based on the maximum entropy RL framework, in which the policy is trained to maximize a weighted sum of expected reward and policy entropy. The method combined sample efficiency, stable convergence, and strong performance on continuous control benchmarks, and rapidly became the default off-policy continuous-action algorithm in academic deep RL. SAC remains, alongside proximal policy optimization, one of the two most widely used baseline RL algorithms for continuous control problems.
With Aviral Kumar, Aurick Zhou, and George Tucker, Levine introduced Conservative Q-Learning (CQL) in 2020. CQL addresses the central pathology of offline reinforcement learning (also called batch RL): when an RL agent learns from a fixed dataset, the standard Bellman update systematically overestimates the values of actions that the dataset does not cover, leading to disastrous policies at deployment. CQL adds a regularizer to the standard Q-function update that pushes Q-values down on out-of-distribution actions, producing a conservative lower bound on the true policy value. The method became one of the most widely used baselines and reference algorithms in the offline RL literature, alongside the closely related implicit Q-learning (IQL) and behavior-regularized methods that the Levine group also helped develop.
Levine and his students subsequently became among the most prominent advocates for offline RL as a practical paradigm for robot learning, on the argument that pretraining on previously collected experience and then fine-tuning online is far more sample-efficient than learning everything from scratch in the real world.
With Chelsea Finn (then his PhD student) and Pieter Abbeel, Levine co-authored the 2017 paper "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks" (MAML), one of the most cited papers in modern meta-learning. MAML proposed an optimization-based meta-learning algorithm that learns an initialization for a neural network such that a few gradient steps on a new task produce strong performance, with no architectural assumptions about the model. MAML has been widely adopted in few-shot classification, reinforcement learning, and robot learning.
Levine's group has also been a major contributor to modern imitation learning, including goal-conditioned behavior cloning (GCBC), inverse reinforcement learning, and large-scale demonstration-based robot pretraining. Many of these techniques later fed into the design of vision-language-action models such as RT-1, RT-2, and π0.
With collaborators at Google DeepMind, Levine co-authored RT-1 ("Robotics Transformer for Real-World Control at Scale") in 2022 and RT-2 ("Vision-Language-Action Models Transfer Web Knowledge to Robotic Control") in 2023. RT-1 demonstrated that a transformer trained on a large multi-task robot dataset could control real robotic arms across hundreds of tasks. RT-2 went further, fine-tuning a pretrained vision-language model so that it could output discretized action tokens, allowing it to inherit semantic generalization from internet-scale data.
At Physical Intelligence, this line of work culminated in the π0 model released in October 2024 and described in the paper "π0: A Vision-Language-Action Flow Model for General Robot Control" (arXiv:2410.24164). π0 is built on top of a pretrained vision-language backbone and uses a flow-matching action head to produce continuous robot action chunks at high frequency. It was trained on a large dataset collected from many different robot platforms, including single-arm robots, dual-arm robots, and mobile manipulators, and was demonstrated on long-horizon manipulation tasks such as folding laundry and clearing a dining table. The follow-up π0.5 model, released in 2025 and described in arXiv:2504.16054 ("π0.5: a Vision-Language-Action Model with Open-World Generalization"), extended π0 with co-training on heterogeneous task data, including web data and high-level semantic prediction, in order to obtain better open-world generalization.
Levine has been a vocal advocate of the broader thesis that robotics needs its own foundation models, trained on broad and diverse data and then specialized to individual platforms. His group at Berkeley participated in the Open X-Embodiment effort, a multi-institution dataset and model release that aggregated robot demonstration data from more than twenty academic and industry labs to train general cross-embodiment policies. The work demonstrated that policies trained on combined cross-platform data outperformed policies trained on data from any single platform, providing empirical support for the foundation-model paradigm in robotics.
| Year | Title | Venue | Approx. citations |
|---|---|---|---|
| 2013 | Guided Policy Search | ICML | 1,400+ |
| 2016 | End-to-End Training of Deep Visuomotor Policies | JMLR | 4,500+ |
| 2017 | Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (with C. Finn, P. Abbeel) | ICML | 14,000+ |
| 2017 | Trust Region Policy Optimization (co-author with J. Schulman et al.) | ICML | 9,000+ |
| 2018 | Soft Actor-Critic: Off-Policy Maximum Entropy Deep RL with a Stochastic Actor (with T. Haarnoja et al.) | ICML | 11,000+ |
| 2018 | Soft Actor-Critic Algorithms and Applications | arXiv | 4,500+ |
| 2018 | QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation | CoRL | 1,300+ |
| 2020 | Conservative Q-Learning for Offline Reinforcement Learning (with A. Kumar et al.) | NeurIPS | 2,800+ |
| 2020 | Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems | arXiv | 2,000+ |
| 2021 | Implicit Behavioral Cloning | CoRL | 700+ |
| 2022 | RT-1: Robotics Transformer for Real-World Control at Scale | RSS | 1,000+ |
| 2023 | RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | CoRL | 800+ |
| 2023 | Open X-Embodiment: Robotic Learning Datasets and RT-X Models (multi-institution) | ICRA | 600+ |
| 2024 | π0: A Vision-Language-Action Flow Model for General Robot Control | Physical Intelligence preprint (arXiv:2410.24164) | 250+ |
| 2025 | π0.5: a Vision-Language-Action Model with Open-World Generalization | arXiv:2504.16054 | growing |
Citation counts are approximate and based on Google Scholar and Semantic Scholar as of mid-2025; values change over time.
Levine teaches CS285 (formerly CS294-112), "Deep Reinforcement Learning," at Berkeley. The course is one of the most widely watched graduate machine learning courses outside the classroom: each year's lecture videos are uploaded to YouTube, and the course materials are open-licensed on the official course website. The syllabus covers:
| Module | Topics |
|---|---|
| Foundations | Markov decision processes, value functions, policy evaluation |
| Policy gradients | REINFORCE, actor-critic, baselines, natural and trust region methods |
| Q-learning | DQN, double DQN, distributional RL |
| Actor-critic | DDPG, TD3, soft actor-critic |
| Model-based RL | Dynamics modeling, MPC, latent-state world models |
| Exploration | Intrinsic motivation, count-based exploration |
| Inverse RL and imitation | Behavioral cloning, GAIL, MaxEnt IRL |
| Offline RL | BCQ, CQL, IQL, behavior-regularized methods |
| Meta-RL and multi-task RL | MAML, PEARL |
| Foundation models for control | Vision-language-action models, scaling laws |
In addition to CS285, Levine has co-taught and lectured in tutorials at NeurIPS, ICML, and CoRL on topics including offline RL, learned models for control, and foundation models for robotics.
The RAIL Lab at Berkeley has produced a large number of researchers who hold faculty positions or senior industry research roles. The table below covers a representative sample (jointly advised students are listed here when Levine was the primary or co-primary advisor).
| Person | Role at graduation | Current position |
|---|---|---|
| Chelsea Finn | PhD, UC Berkeley (jointly with Pieter Abbeel), 2018 | Assistant professor, Stanford; co-founder, Physical Intelligence |
| Tuomas Haarnoja | PhD, UC Berkeley, 2018 | Senior research scientist, Google DeepMind |
| Aviral Kumar | PhD, UC Berkeley, 2023 | Assistant professor, Carnegie Mellon University; research scientist at Google DeepMind |
| Karol Hausman | Postdoc / collaborator (PhD from USC, 2018) | CEO and co-founder, Physical Intelligence |
| Abhishek Gupta | PhD, UC Berkeley, 2021 | Assistant professor, University of Washington |
| Karl Pertsch | Postdoc, UC Berkeley | Research scientist, Physical Intelligence |
| Justin Fu | PhD, UC Berkeley | Research scientist, Google DeepMind |
| Coline Devin | PhD, UC Berkeley | Research scientist, Google DeepMind |
| Marvin Zhang | PhD, UC Berkeley | Industry research |
| Anusha Nagabandi | PhD, UC Berkeley | Co-founder, Covariant AI |
| Vitchyr Pong | PhD, UC Berkeley | Industry research |
| Kelvin Xu | PhD, UC Berkeley | Industry research |
The RAIL group also collaborates extensively with Pieter Abbeel's BAIR group on shared infrastructure and joint students, and historically with Trevor Darrell, Ken Goldberg, and other Berkeley robotics and computer vision faculty.
| Year | Award |
|---|---|
| 2014 | Stanford Computer Science PhD (dissertation "Motor Skill Learning with Local Trajectory Methods") |
| 2016 | NSF CAREER Award |
| 2018 | Okawa Foundation Research Grant |
| 2019 | Sloan Research Fellowship |
| 2024 | Presidential Early Career Award for Scientists and Engineers (PECASE) |
Levine has also been recognized through best paper and best paper finalist nominations at venues including ICRA, ICLR, NeurIPS, and CoRL, as well as on community lists of top-cited or most-influential authors in artificial intelligence.
The following bibliography lists representative works that span the major lines of research in Levine's career. They are referenced inline in the article above by year and short title.