Greg Yang
Last reviewed
Jun 8, 2026
Sources
16 citations
Review status
Source-backed
Revision
v1 · 1,674 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 8, 2026
Sources
16 citations
Review status
Source-backed
Revision
v1 · 1,674 words
Add missing citations, update stale details, or suggest a clearer explanation.
Greg Yang (also published as Ge Yang) is a mathematician and artificial intelligence researcher best known for the Tensor Programs series of papers and for the maximal update parametrization, usually written muP, a mathematical theory of how to scale neural networks [1][5]. The theory underpins muTransfer, a method that lets practitioners tune the hyperparameters of a small model and transfer them directly to a much larger one, cutting the cost of configuring giant models [6]. Yang spent the formative part of his career at Microsoft Research and in 2023 became one of the founding researchers of xAI, the company started by Elon Musk [4]. In January 2026 he announced that he was stepping back from xAI into an informal advisory role to manage a Lyme disease diagnosis [1][2].
Yang studied at Harvard University, but his path through it was unusual. At the end of his sophomore year he left to pursue music full time, working for roughly a year and a half as an electronic dance music producer and DJ under the name Zeta [15]. During that interlude he encountered ideas about artificial intelligence, and the prospect of building human-level AI became his central preoccupation. He returned to Harvard, accelerated his remaining coursework, and graduated in 2018 with an A.B. in mathematics and an S.M. in computer science [5][15].
His senior thesis developed what he called a homological theory of functions, drawing on computational complexity theory, learning theory, and algebraic topology [14][15]. The work won Harvard's Thomas Temple Hoopes Prize, awarded for outstanding undergraduate scholarship, and Yang received an honorable mention for the 2018 Frank and Brennie Morgan Prize, the annual award given by the American Mathematical Society, the Mathematical Association of America, and the Society for Industrial and Applied Mathematics for outstanding research in mathematics by an undergraduate [14]. The combination of a pure-mathematics background with an emerging interest in machine learning would define the rest of his career.
After Harvard, Yang joined Microsoft Research in Redmond, Washington, where he eventually held the title of Senior Researcher [5]. He was drawn there by the chance to apply rigorous mathematics to a field that was advancing largely through empirical trial and error. He framed his ambition as building a kind of "Theory of Everything" for large-scale deep learning: a framework that would both reveal the optimal way to scale neural networks and provide enough understanding to guide safety and alignment efforts [5].
It was at Microsoft Research that Yang produced the body of work for which he is best known. Between 2019 and 2023 he wrote the Tensor Programs papers and, with collaborators, turned the resulting theory into a practical engineering tool. In 2022 Microsoft Research published a description of the muTransfer technique, presenting it as a way to tune enormous neural networks that are otherwise too expensive to train more than once [13].
A Tensor Program is a formal way of writing down a computation as a sequence of matrix multiplications and coordinatewise nonlinear functions. Yang's insight was that almost any computation in deep learning, from a feedforward network to a transformer, can be expressed in this language, which makes it possible to reason about all of them at once [5][8]. By studying what happens to such programs as the width of a network grows toward infinity, Yang unified several previously separate threads of theory, including the neural network and Gaussian process correspondence, the Neural Tangent Kernel, and tools from random matrix theory [5][7][8].
The series unfolded across several papers:
| Paper | Year | Focus |
|---|---|---|
| Tensor Programs I | 2019 | Wide networks of any architecture behave as Gaussian processes [7] |
| Tensor Programs II | 2020 | Neural Tangent Kernel for any architecture [8] |
| Tensor Programs III | 2020 | Neural Matrix Laws and the free independence of weights and activations [9] |
| Tensor Programs IV | 2021 | Feature learning in infinite-width networks, introducing muP [10] |
| Tensor Programs V | 2022 | Zero-shot hyperparameter transfer via muP and muTransfer [6] |
| Tensor Programs VI | 2023 | Feature learning in infinite-depth networks [12] |
| Tensor Programs IVb | 2023 | Adaptive optimization in the infinite-width limit [11] |
The practical payoff came from the maximal update parametrization. Under the standard ways of initializing and scaling a network, the best learning rate and related hyperparameters drift as the model gets wider, so settings tuned on a small model are wrong for a large one. Tensor Programs IV identified muP as the parametrization that keeps feature learning well behaved as width grows, and Tensor Programs V showed that under muP many optimal hyperparameters stay essentially constant across model sizes [6][10]. That stability is what makes muTransfer possible: a practitioner parametrizes the target model in muP, tunes hyperparameters on a small and cheap proxy, and transfers them to the full-sized model without ever tuning the large model directly [6].
The reported gains were striking. Yang and his co-authors, who included Edward J. Hu and several researchers from OpenAI, showed that by transferring from a 40-million-parameter proxy they could exceed the published results of the 6.7-billion-parameter version of GPT-3 while spending only about 7 percent of its pretraining compute on tuning, and that a 13-million-parameter proxy was enough to surpass published BERT-large numbers [6]. The technique entered wide use across the industry for training frontier models. The GPT-4 technical report lists Tensor Programs V among its references, although OpenAI did not spell out the details of how it tuned the model [16].
Yang was one of the founding members of xAI, which was incorporated in Nevada on March 9, 2023, and publicly announced on July 12, 2023 [4]. The launch named a twelve-person founding team led by Elon Musk, with Igor Babuschkin as chief engineer and other researchers drawn from OpenAI, Google DeepMind, Microsoft Research, and Tesla, among them Christian Szegedy and Jimmy Ba [4]. Yang joined as the theorist of the group, bringing the muP line of work and a stated interest in using mathematics to make the scaling of large models more principled. At xAI he contributed to the research behind Grok, the company's family of large language models, and to its broader push into reasoning and AI for science.
On January 20, 2026, Yang announced on the social platform X that he was stepping back from xAI. In his own words, he had "been suffering from Lyme disease" and was moving "into an informal advisory role" so that he could, as he put it, "go founder mode on my health" [2]. He described symptoms that began in early 2025 after a bout of illness, including severe fatigue and reduced cognitive function that did not lift even after long periods of rest, and said that doctors had eventually diagnosed Lyme disease, which he believed he had carried asymptomatically for years before the strain of building xAI brought it on [2][3]. Musk responded publicly that he hoped Yang would get well soon [3]. News organizations including Bloomberg reported the step-back, framing it around the Lyme diagnosis [1].
Yang's departure from day-to-day work was part of a broader turnover among xAI's original founders. By the time of his announcement he was the fourth of the twelve co-founders to step away, following Kyle Kosic, Igor Babuschkin, and Christian Szegedy, and further exits, including Jimmy Ba, Yuhuai Wu, and Toby Pohlen, were reported in the following weeks [4]. As of mid-2026, the public record describes Yang as holding an informal advisory relationship with xAI while focusing on his recovery; the health details rest on his own statement and on news reporting rather than any detailed company disclosure [1][2][4].
Yang is recognized primarily for turning abstract results about infinite-width networks into tools that working engineers use. The maximal update parametrization and muTransfer have been adopted and extended by other labs, with follow-up work such as the unit-scaled muP appearing at major machine learning conferences [6]. His earlier honors, the Hoopes Prize and the 2018 Morgan Prize honorable mention, marked him as a strong mathematician before he turned to deep learning [14][15]. He has also been an active communicator of his research, presenting the Tensor Programs framework in university seminars and on technical podcasts and maintaining a public account of the work through his professional page and writing [5][15]. Within the field he is frequently cited as an example of a pure mathematician whose theory had direct, measurable impact on how the largest neural networks are trained.