# Greg Yang

> Source: https://aiwiki.ai/wiki/greg_yang
> Updated: 2026-06-28
> Categories: People
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

Greg Yang (also published as Ge Yang) is a mathematician and artificial intelligence researcher best known for the Tensor Programs series of papers and the maximal update parametrization, usually written muP, a mathematical theory of how to scale [neural networks](/wiki/neural_network) [1][5]. He spent the formative part of his career as a Senior Researcher at [Microsoft Research](/wiki/microsoft_research), where he developed muP and its companion technique muTransfer, and in 2023 he became one of the twelve founding members of [xAI](/wiki/xai), the company started by [Elon Musk](/wiki/elon_musk) [4][5]. muTransfer lets practitioners tune the [hyperparameters](/wiki/hyperparameter) of a small model and transfer them directly to a much larger one, cutting the cost of configuring giant models, and was used in the training of GPT-style systems [6]. In January 2026 Yang announced that he was stepping back from xAI into an informal advisory role to manage a Lyme disease diagnosis [1][2].

## Who is Greg Yang?

Greg Yang is a pure mathematician who turned the theory of infinite-width neural networks into engineering tools that practitioners use to train the largest models. His central research program, which he frames as a kind of "Theory of Everything" for large-scale [deep learning](/wiki/deep_learning), aims to reveal the optimal way to scale neural networks and to provide enough understanding to guide safety and alignment efforts [5]. He is recognized within the field as an example of a pure mathematician whose theoretical results had direct, measurable impact on how frontier neural networks are configured and trained [6][13].

## Education

Yang studied at [Harvard University](/wiki/harvard_university), but his path through it was unusual. At the end of his sophomore year he left to pursue music full time, working for roughly a year and a half as an electronic dance music producer and DJ under the name Zeta [15]. During that interlude he encountered ideas about artificial intelligence, and the prospect of building human-level AI became his central preoccupation. He returned to Harvard, accelerated his remaining coursework, and graduated in 2018 with an A.B. in mathematics and an S.M. in computer science [5][15].

His senior thesis developed what he called a homological theory of functions, drawing on computational complexity theory, learning theory, and algebraic topology [14][15]. The work won Harvard's Thomas Temple Hoopes Prize, awarded for outstanding undergraduate scholarship, and Yang received an honorable mention for the 2018 Frank and Brennie Morgan Prize, the annual award given by the American Mathematical Society, the Mathematical Association of America, and the Society for Industrial and Applied Mathematics for outstanding research in mathematics by an undergraduate [14]. The combination of a pure-mathematics background with an emerging interest in machine learning would define the rest of his career.

## What did Greg Yang do at Microsoft Research?

After Harvard, Yang joined Microsoft Research in Redmond, Washington, where he eventually held the title of Senior Researcher [5]. He was drawn there by the chance to apply rigorous mathematics to a field that was advancing largely through empirical trial and error. He framed his ambition as building a kind of "Theory of Everything" for large-scale deep learning: a framework that would both reveal the optimal way to scale neural networks and provide enough understanding to guide safety and alignment efforts [5].

It was at Microsoft Research that Yang produced the body of work for which he is best known. Between 2019 and 2023 he wrote the Tensor Programs papers and, with collaborators, turned the resulting theory into a practical engineering tool. In 2022 Microsoft Research published a description of the muTransfer technique, presenting it as a way to tune enormous neural networks that are otherwise too expensive to train more than once [13].

## What is Tensor Programs?

A Tensor Program is a formal way of writing down a computation as a sequence of matrix multiplications and coordinatewise nonlinear functions. Yang's insight was that almost any computation in deep learning, from a feedforward network to a transformer, can be expressed in this language, which makes it possible to reason about all of them at once [5][8]. By studying what happens to such programs as the width of a network grows toward infinity, Yang unified several previously separate threads of theory, including the neural network and Gaussian process correspondence, the [Neural Tangent Kernel](/wiki/neural_tangent_kernel), and tools from random matrix theory [5][7][8].

The series unfolded across several papers:

| Paper | Year | Focus |
| --- | --- | --- |
| Tensor Programs I | 2019 | Wide networks of any architecture behave as Gaussian processes [7] |
| Tensor Programs II | 2020 | Neural Tangent Kernel for any architecture [8] |
| Tensor Programs III | 2020 | Neural Matrix Laws and the free independence of weights and activations [9] |
| Tensor Programs IV | 2021 | Feature learning in infinite-width networks, introducing muP [10] |
| Tensor Programs V | 2022 | Zero-shot hyperparameter transfer via muP and muTransfer [6] |
| Tensor Programs VI | 2023 | Feature learning in infinite-depth networks [12] |
| Tensor Programs IVb | 2023 | Adaptive optimization in the infinite-width limit [11] |

## What is the muP / muTransfer method?

The practical payoff came from the maximal update parametrization. Under the standard ways of initializing and scaling a network, the best learning rate and related hyperparameters drift as the model gets wider, so settings tuned on a small model are wrong for a large one. Tensor Programs IV identified muP as the parametrization that keeps feature learning well behaved as width grows, and Tensor Programs V showed that, as the paper puts it, "in the recently discovered Maximal Update Parametrization (muP), many optimal HPs remain stable even as model size changes" [6][10]. That stability is what makes muTransfer possible: a practitioner parametrizes the target model in muP, tunes hyperparameters on a small and cheap proxy, and transfers them to the full-sized model without ever tuning the large model directly [6].

The reported gains were striking. Yang and his co-authors, who included Edward J. Hu and several researchers from OpenAI, reported that "by transferring from 40M parameters, we outperform published numbers of the 6.7B GPT-3 model, with tuning cost only 7% of total pretraining cost," and that a 13-million-parameter proxy was enough to surpass published [BERT](/wiki/bert)-large numbers [6]. Microsoft released a PyTorch implementation of the method, the mup package, installable via pip [13]. The technique entered wide use across the industry for training frontier models, and the [GPT-4](/wiki/gpt_4) technical report lists Tensor Programs V among its references, although OpenAI did not spell out the details of how it tuned the model [16]. This made [GPT-3](/wiki/gpt_3)-scale hyperparameter tuning feasible on far smaller proxy models than the targets themselves [6].

## What is Greg Yang's role at xAI?

Yang was one of the founding members of xAI, which was incorporated in Nevada on March 9, 2023, and publicly announced on July 12, 2023 [4]. The launch named a twelve-person founding team led by Elon Musk, with [Igor Babuschkin](/wiki/igor_babuschkin) as chief engineer and other researchers drawn from OpenAI, Google DeepMind, Microsoft Research, and Tesla, among them [Christian Szegedy](/wiki/christian_szegedy) and [Jimmy Ba](/wiki/jimmy_ba) [4]. Yang joined as the theorist of the group, bringing the muP line of work and a stated interest in using mathematics to make the scaling of large models more principled. At xAI he contributed to the research behind [Grok](/wiki/grok), the company's family of large language models, and to its broader push into reasoning and AI for science.

On January 21, 2026, Yang announced on the social platform X that he was stepping back from xAI. In his own words, he had "been suffering from Lyme disease" and was moving "into an informal advisory role" so that he could, as he put it, "go founder mode on my health" [2]. He described symptoms that began in early 2025 after a bout of illness, including severe fatigue and reduced cognitive function that did not lift even after long periods of rest, and said that doctors had eventually diagnosed Lyme disease, which he believed he had carried asymptomatically for years before the strain of building xAI brought it on [2][3]. Musk responded publicly that he hoped Yang would get well soon [3]. News organizations including Bloomberg reported the step-back, framing it around the Lyme diagnosis [1].

Yang's departure from day-to-day work was part of a broader turnover among xAI's original founders. By the time of his announcement he was the fourth of the twelve co-founders to step away, following Kyle Kosic, Igor Babuschkin, and Christian Szegedy, and further exits, including Jimmy Ba, Yuhuai Wu, and Toby Pohlen, were reported in the following weeks [4]. As of mid-2026, the public record describes Yang as holding an informal advisory relationship with xAI while focusing on his recovery; the health details rest on his own statement and on news reporting rather than any detailed company disclosure [1][2][4].

## What is Greg Yang recognized for?

Yang is recognized primarily for turning abstract results about infinite-width networks into tools that working engineers use. The maximal update parametrization and muTransfer have been adopted and extended by other labs, with follow-up work such as the unit-scaled muP appearing at major machine learning conferences [6]. His earlier honors, the Hoopes Prize and the 2018 Morgan Prize honorable mention, marked him as a strong mathematician before he turned to deep learning [14][15]. He has also been an active communicator of his research, presenting the Tensor Programs framework in university seminars and on technical podcasts and maintaining a public account of the work through his professional page and writing [5][15]. Within the field he is frequently cited as an example of a pure mathematician whose theory had direct, measurable impact on how the largest neural networks are trained.

## References

1. Bloomberg, "xAI Co-Founder Yang Leaves Musk's Startup After Lyme Diagnosis," January 20, 2026. https://www.bloomberg.com/news/articles/2026-01-20/xai-co-founder-yang-leaves-musk-s-startup-after-lyme-diagnosis
2. Greg Yang (@TheGregYang), post on X announcing his step-back from xAI, January 2026. https://x.com/TheGregYang/status/2013652609455006006
3. OfficeChai, "xAI Co-founder Greg Yang Steps Aside Into Informal Advisory Role Following Health Concerns," January 2026. https://officechai.com/ai/xai-co-founder-greg-yang-steps-aside-into-informal-advisory-role-following-health-concerns/
4. Silicon Republic, "Toby Pohlen is the latest co-founder to exit xAI," 2026. https://www.siliconrepublic.com/business/toby-pohlen-latest-co-founder-to-exit-xai-jimmy-ba-tony-wu
5. Greg Yang, professional page. https://thegregyang.com/
6. Greg Yang et al., "Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer," arXiv:2203.03466, 2022. https://arxiv.org/abs/2203.03466
7. Greg Yang, "Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes," arXiv:1910.12478, 2019. https://arxiv.org/abs/1910.12478
8. Greg Yang, "Tensor Programs II: Neural Tangent Kernel for Any Architecture," arXiv:2006.14548, 2020. https://arxiv.org/abs/2006.14548
9. Greg Yang, "Tensor Programs III: Neural Matrix Laws," arXiv:2009.10685, 2020. https://arxiv.org/abs/2009.10685
10. Greg Yang and Edward J. Hu, "Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks," ICML 2021. https://proceedings.mlr.press/v139/yang21c.html
11. Greg Yang et al., "Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit," arXiv:2308.01814, 2023. https://arxiv.org/abs/2308.01814
12. Greg Yang et al., "Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks," arXiv:2310.02244, 2023. https://arxiv.org/abs/2310.02244
13. Microsoft Research, "muTransfer: A technique for hyperparameter tuning of enormous neural networks," 2022. https://www.microsoft.com/en-us/research/blog/%C2%B5transfer-a-technique-for-hyperparameter-tuning-of-enormous-neural-networks/
14. American Mathematical Society, "AMS Prize Announcements," Notices of the AMS, April 2018. https://www.ams.org/journals/notices/201804/rnoti-p465.pdf
15. The Gradient, "Greg Yang on Communicating Research, Tensor Programs, and muTransfer." https://thegradientpub.substack.com/p/greg-yang-on-communicating-research
16. OpenAI, "GPT-4 Technical Report," arXiv:2303.08774, 2023. https://arxiv.org/abs/2303.08774

