# Jimmy Ba

> Source: https://aiwiki.ai/wiki/jimmy_ba
> Updated: 2026-06-27
> Categories: People
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

Jimmy Ba (also published as Jimmy Lei Ba or Lei Jimmy Ba) is a Canadian machine learning researcher and an associate professor of computer science at the [University of Toronto](/wiki/university_of_toronto), best known as a co-author of the [Adam optimizer](/wiki/adam_optimizer) and a co-inventor of [Layer Normalization](/wiki/layer_normalization), two methods that became standard building blocks for training [deep learning](/wiki/deep_learning) models and [transformer](/wiki/transformer) networks [2][3][4]. He completed his doctorate at Toronto under [Geoffrey Hinton](/wiki/geoffrey_hinton), is a faculty member of the [Vector Institute](/wiki/vector_institute), and in 2023 was one of the twelve co-founders of [xAI](/wiki/xai), the artificial intelligence company started by [Elon Musk](/wiki/elon_musk) [1][5][7]. As of 2026, his Google Scholar profile lists more than 325,000 citations and an h-index of 69, with the Adam paper alone accounting for over 250,000 of them, ranking it among the most cited works in all of science [2].

## Who is Jimmy Ba?

Jimmy Ba is a machine learning researcher in the Toronto deep learning school, the academic lineage built around Geoffrey Hinton. He earned all three of his degrees at the University of Toronto: an undergraduate degree in 2011, a master's degree in 2014, and a PhD, after which he joined the Department of Computer Science as a faculty member [5][6]. His research centers on efficient learning algorithms for deep neural networks, with related interests in optimization, [reinforcement learning](/wiki/reinforcement_learning), and natural language processing [5]. He led research, safety, and enterprise work at xAI from its founding in 2023 until his departure in February 2026, while remaining a member of the University of Toronto faculty [1][10].

## What is his education and how is he connected to Hinton?

Ba earned all three of his degrees at the University of Toronto. He completed an undergraduate degree in 2011 and a master's degree in 2014, working during his graduate studies with the machine learning group that grew out of Hinton's lab [5]. His early research mentors included [Brendan Frey](/wiki/brendan_frey) and [Ruslan Salakhutdinov](/wiki/ruslan_salakhutdinov), both prominent figures in the Toronto deep learning community [5].

His master's research, supervised by Frey, produced an early paper on adaptive [dropout](/wiki/dropout), a method sometimes called standout that learns which units in a [neural network](/wiki/neural_network) to drop during training rather than dropping them at random [5]. He then pursued a PhD under the supervision of Geoffrey Hinton. His doctoral thesis, "Learning to Attend with Neural Networks," collected his work on attention mechanisms and was deposited in the University of Toronto's research repository in 2020 [5][8]. During his doctoral studies he received a Facebook Graduate Fellowship in machine learning in 2016, and he was among the first students at a Canadian institution to receive that fellowship [5][6].

## What is his academic career and what honors has he received?

Ba was appointed an assistant professor in the Department of Computer Science at the University of Toronto in 2018, joining the department's Machine Learning Group and becoming a faculty member of the Vector Institute for Artificial Intelligence [6]. By the time of his work at xAI, news coverage described him as an associate professor at the university [1][10].

In December 2018 Ba was named one of the inaugural Canada CIFAR AI Chairs, part of the Canadian government's Pan-Canadian Artificial Intelligence Strategy, a 125 million dollar initiative intended to keep the country at the forefront of the field [6]. The Vector Institute, founded in 2017 through a partnership between the University of Toronto, governments, and industry, was one of three national AI hubs anchoring the program [6]. In February 2023 he received a Sloan Research Fellowship from the Alfred P. Sloan Foundation, a recognition given to early-career researchers, cited for work on making deep learning more efficient and reliable [7]. On receiving it, Ba said: "I am honoured to receive the Sloan Fellowship in Computer Science this year. It would not be possible without the generous support of my colleagues and students, to whom I owe a debt of gratitude." [7]

His research centers on efficient learning algorithms for deep neural networks, with related interests in optimization, reinforcement learning, and natural language processing [5]. Beyond Adam, his lab contributed optimization methods such as the Lookahead optimizer and the Follow-the-Ridge algorithm for minimax problems [7].

| Honor | Year | Awarding body |
| --- | --- | --- |
| Facebook Graduate Fellowship in machine learning | 2016 | Facebook (Meta) [5][6] |
| Inaugural Canada CIFAR AI Chair | 2018 | CIFAR / Pan-Canadian AI Strategy [6] |
| Sloan Research Fellowship | 2023 | Alfred P. Sloan Foundation [7] |

## What is Jimmy Ba known for?

Ba's most influential contribution is Adam, introduced in the paper "Adam: A Method for Stochastic Optimization," written with [Diederik Kingma](/wiki/diederik_kingma) and first posted to arXiv on December 22, 2014, before being presented at the International Conference on Learning Representations in 2015 [3]. The name is short for adaptive moment estimation. The algorithm maintains running estimates of the first and second moments of the gradients and uses them to set a separate, adaptively scaled learning rate for each parameter, combining ideas from momentum and from earlier adaptive methods [3]. Because it works well across a wide range of problems with little tuning of its [hyperparameters](/wiki/hyperparameter), Adam became the default optimizer for training deep networks and, later, large language models. According to his Google Scholar profile, the Adam paper has been cited more than 254,000 times, making it one of the most cited papers in the history of science [2].

His second landmark paper, "Layer Normalization," written in 2016 with Jamie Ryan Kiros and Geoffrey Hinton, addressed a limitation of batch normalization [4]. Layer normalization computes its normalizing statistics across the features within a single layer for each individual example, which makes it independent of batch size and well suited to [recurrent neural networks](/wiki/recurrent_neural_network) and to settings with small or variable batches [4]. The method was widely adopted, has been cited more than 19,000 times, and became a core building block of the transformer architecture, including the models behind [Grok](/wiki/grok) [2][4].

Ba's earlier and ongoing work spans several areas of deep learning:

| Paper | Year | Selected co-authors | Venue |
| --- | --- | --- | --- |
| Do Deep Nets Really Need to be Deep? | 2014 | Rich Caruana | NeurIPS [2] |
| Adam: A Method for Stochastic Optimization | 2015 | Diederik Kingma | ICLR [3] |
| Multiple Object Recognition with Visual Attention | 2015 | Volodymyr Mnih, Koray Kavukcuoglu | ICLR [2] |
| Show, Attend and Tell | 2015 | Kelvin Xu, Ryan Kiros, Kyunghyun Cho | ICML [14] |
| Layer Normalization | 2016 | Jamie Ryan Kiros, Geoffrey Hinton | NIPS workshop [4] |
| Using Fast Weights to Attend to the Recent Past | 2016 | Geoffrey Hinton and others | NIPS [2] |
| Lookahead Optimizer | 2019 | Michael Zhang, James Lucas, Geoffrey Hinton | NeurIPS [2] |
| Dream to Control (Dreamer) | 2020 | Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi | ICLR [15] |

The 2014 paper "Do Deep Nets Really Need to be Deep?," written with Rich Caruana, showed that shallow networks trained to imitate a deep network could reach comparable accuracy, an early result in the line of work on model compression and knowledge distillation [2][9]. His thesis work on [attention](/wiki/attention), including "Multiple Object Recognition with Visual Attention" and the widely cited image-captioning paper "Show, Attend and Tell," explored how networks could learn where to look in an input [14]. More recently he worked on model-based reinforcement learning, co-authoring the Dreamer agents that learn behaviors inside a learned [world model](/wiki/world_model) [15]. As of 2026 his Google Scholar profile listed more than 325,000 citations, an h-index of 69, and an i10-index of 106 [2].

## What is his role at xAI?

Ba was one of the twelve co-founders of xAI, which Elon Musk incorporated on March 9, 2023, and publicly announced on July 12, 2023 [12]. The founding team was led by Musk, with [Igor Babuschkin](/wiki/igor_babuschkin) as chief engineer, and was drawn largely from researchers at OpenAI, Google DeepMind, Google, Microsoft, and Tesla; alongside Ba it included [Greg Yang](/wiki/greg_yang), [Christian Szegedy](/wiki/christian_szegedy), Yuhuai "Tony" Wu, Zihang Dai, Guodong Zhang, Kyle Kosic, Manuel Kroiss, Ross Nordeen, and Toby Pohlen [12]. At xAI, Ba led research, safety, and enterprise efforts and reported directly to Musk, and his research was credited with influencing the company's Grok version 4 ([Grok 4](/wiki/grok_4)) models [1][10]. His own Layer Normalization work is among the techniques that sit inside the Grok [large language models](/wiki/large_language_model).

On February 2, 2026, SpaceX acquired xAI in an all-stock transaction that made the AI company a wholly owned subsidiary, with xAI valued at about 250 billion dollars and the combined entity at roughly 1.25 trillion dollars [12]. A wave of co-founder departures followed (the [SpaceX-xAI merger](/wiki/spacex_xai_merger)). Ba announced his exit on the evening of Tuesday, February 10, 2026, a day after Yuhuai "Tony" Wu announced his own departure, making Ba the sixth of the original twelve co-founders to leave [1][13]. In his farewell post on X, Ba wrote: "Last day at xAI. xAI's mission is push humanity up the Kardashev tech tree. Grateful to have helped cofound at the start. And enormous thanks to @elonmusk for bringing us together on this incredible journey. So proud of what the xAI team has done and will continue to stay close." [11] He added that it was "time to recalibrate my gradient on the big picture" and predicted that "2026 is gonna be insane and likely the busiest (and most consequential) year for the future of our species." [13]

The departures coincided with regulatory scrutiny of xAI in Europe and elsewhere over content produced by Grok [1]. By the time Ba left, co-founders Kyle Kosic, Babuschkin, Szegedy, and Greg Yang had already stepped away, and further exits, including Guodong Zhang, Zihang Dai, and Ross Nordeen, were reported in the weeks that followed, eventually leaving Musk as the only remaining co-founder at the company [12]. The University of Toronto's student newspaper characterized the Toronto-affiliated co-founders as having been "pushed out" of the firm, while Ba's own statement framed the move in positive terms [10]. He remained a member of the University of Toronto faculty after leaving xAI.

## Why is Jimmy Ba considered influential?

Ba is widely regarded as one of the most impactful researchers to come out of the Toronto deep learning school. Adam is among the most cited works in the history of science, and it is difficult to find a modern deep learning system that does not rely on it or a close variant; Layer Normalization is similarly embedded in essentially every transformer-based model [2][3][4]. His honors include the Facebook Graduate Fellowship in machine learning (2016), selection as one of the first Canada CIFAR AI Chairs (2018), and a Sloan Research Fellowship (2023) [5][6][7]. Within the field he is frequently named alongside his doctoral advisor Geoffrey Hinton and other Toronto researchers as part of the generation whose methods made large-scale deep learning practical.

## References

1. CNBC, "Musk's xAI loses second co-founder in two days as Jimmy Ba departs," February 10, 2026. https://www.cnbc.com/2026/02/10/musks-xai-loses-second-co-founder-in-two-days-as-jimmy-ba-departs.html
2. Jimmy Ba, Google Scholar profile. https://scholar.google.com/citations?user=ymzxRhAAAAAJ&hl=en
3. Diederik P. Kingma and Jimmy Ba, "Adam: A Method for Stochastic Optimization," arXiv:1412.6980, 2014 (ICLR 2015). https://arxiv.org/abs/1412.6980
4. Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton, "Layer Normalization," arXiv:1607.06450, 2016. https://arxiv.org/abs/1607.06450
5. Jimmy Ba, home page, University of Toronto. https://jimmylba.github.io/
6. University of Toronto, "Eight U of T researchers named AI chairs by Canadian Institute for Advanced Research," December 3, 2018. https://www.utoronto.ca/news/eight-u-t-researchers-named-ai-chairs-canadian-institute-advanced-research
7. University of Toronto Department of Computer Science, "Sloan Research Fellowships awarded to Jimmy Ba and Sushant Sachdeva," February 24, 2023. https://web.cs.toronto.edu/news-events/news/sloan-research-fellowships-awarded-to-jimmy-ba-and-sushant-sachdeva
8. Lei Jimmy Ba, "Learning to Attend with Neural Networks," PhD thesis, University of Toronto, 2020. https://tspace.library.utoronto.ca/handle/1807/103985
9. Lei Jimmy Ba and Rich Caruana, "Do Deep Nets Really Need to be Deep?," Advances in Neural Information Processing Systems, 2014. https://arxiv.org/abs/1312.6184
10. The Varsity, "U of T prof, alumni, pushed out of Musk's AI firm," March 15, 2026. https://thevarsity.ca/2026/03/15/u-of-t-prof-alumni-pushed-out-of-musks-ai-firm/
11. Jimmy Ba, post on X, February 10, 2026. https://x.com/jimmybajimmyba/status/2021374875793801447
12. Wikipedia, "xAI (company)." https://en.wikipedia.org/wiki/XAI_(company)
13. OfficeChai, "xAI Co-founder Jimmy Ba Quits, 6 Of 12 Co-founders Have Now Left," 2026. https://officechai.com/ai/xai-co-founder-jimmy-ba-quits-6-of-12-co-founders-have-now-left/
14. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, and others, "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention," ICML 2015. https://arxiv.org/abs/1502.03044
15. Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi, "Dream to Control: Learning Behaviors by Latent Imagination," ICLR 2020. https://arxiv.org/abs/1912.01603