Yang Zhilin
Last reviewed
May 31, 2026
Sources
16 citations
Review status
Source-backed
Revision
v1 · 2,139 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
16 citations
Review status
Source-backed
Revision
v1 · 2,139 words
Add missing citations, update stale details, or suggest a clearer explanation.
Yang Zhilin (Chinese: 杨植麟; pinyin: Yáng Zhílín) is a Chinese artificial intelligence researcher and entrepreneur. He is the co-founder and chief executive officer of Moonshot AI (月之暗面), a Beijing startup that develops the Kimi family of large language models. [1][2] Before founding the company, Yang completed a PhD at Carnegie Mellon University, where he co-authored the natural language processing papers Transformer-XL and XLNet. [3][4] Moonshot AI is often grouped with a small set of well funded Chinese startups that some investors call the country's "AI tigers," and it has attracted backing from Alibaba, Tencent, and other large technology firms. [5][6]
In his pinyin name the family name is Yang and the given name is Zhilin. English language coverage and his own research papers usually render the name as Zhilin Yang. [3]
Yang was born in Shantou, in China's Guangdong province. Most English language sources give his birth year as 1992, while some Chinese profiles list 1993. [1][2] As a teenager he was drawn to music and at one point imagined a future as a rock musician, an interest that later shaped how he and his co-founders named their company and its meeting rooms. [1][7]
He studied at Tsinghua University in Beijing and earned a bachelor's degree in computer science in 2015. [2][6] At Tsinghua he was advised by the professor Jie Tang, who works on data mining and large scale machine learning. [1] During his undergraduate years Yang played drums and wrote songs for a campus rock band called Splay, whose name echoes the splay tree data structure. [1][7]
After Tsinghua, Yang moved to the United States to pursue doctoral study at the Language Technologies Institute of Carnegie Mellon University. His advisors were Ruslan Salakhutdinov, a machine learning researcher who later led AI research at Apple, and William W. Cohen, a researcher who later joined Google DeepMind. [1][2] Yang completed the PhD in 2019, in roughly four years rather than the longer span that is typical in the United States, with a thesis titled "Advances in Generative Feature Learning." [1][6] During his studies he also spent time at industry research labs, including Google Brain and Facebook AI Research. [1][2]
Yang's academic work centers on representation learning and language modeling. He published more than twenty papers at major machine learning and natural language processing venues, and by the early 2020s Chinese media described him as one of the most cited researchers under the age of 35 in NLP within China. [1][2] He collaborated with senior researchers including the Turing Award winners Yoshua Bengio and Yann LeCun, and he co-authored the multi-hop question answering dataset HotpotQA. [1][2]
Transformer-XL, introduced in 2019 with Zihang Dai as the lead author and Yang among the co-authors, addressed a limitation of the original Transformer for language modeling. Standard Transformers process text in fixed length segments and cannot easily carry information across segment boundaries, which caps the amount of context a model can use. Transformer-XL added a segment level recurrence mechanism and a relative positional encoding scheme so that hidden states from earlier segments could be reused, letting the model attend to a longer stretch of preceding text. [4][8] The design improved language modeling results and became a building block for later work on long sequences.
XLNet, published in 2019, was authored by Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. [3] It proposed a training objective called permutation language modeling. Rather than masking tokens in the style of BERT, XLNet maximizes the expected likelihood over many possible orderings of the input sequence, so that each position can learn from tokens on both its left and its right while keeping an autoregressive formulation. [3] Because it does not corrupt the input with mask symbols, XLNet avoids the mismatch between pretraining and fine tuning that affected masked models, and it used Transformer-XL as its backbone to handle longer context. [3] On its release XLNet reported gains over BERT across roughly twenty language understanding tasks, including question answering, natural language inference, sentiment analysis, and document ranking. [3] The two papers were among the more heavily cited NLP works of that period, and XLNet alone has accumulated well over ten thousand citations. [2][6]
The research connects directly to the work Yang would later pursue in industry. Both Transformer-XL and XLNet are concerned with extending how much context a model can use, a theme that runs through Moonshot AI's emphasis on long-context systems. [9]
While still a student, Yang co-founded Recurrent AI in 2016, a company that applied language technology to sales and customer conversation analysis. [1] In China he also took part in large model efforts during the early years of the field. Reports describe his involvement around 2020 and 2021 with work tied to Huawei's PanGu models and with the Wu Dao large model project at the Beijing Academy of Artificial Intelligence. [1] A dispute later arose between Yang and some early investors connected to Recurrent AI, including the venture firm GSR Ventures, over his move to start a new company. [1][6]
Yang co-founded Moonshot AI in 2023 in Beijing, together with former Tsinghua schoolmates. Several of the co-founders, among them Zhou Xinyu and Wu Yuxin, had backgrounds in machine learning and computer vision, and some had played together in Yang's college band. [1][7] The company's Chinese name, 月之暗面, translates as "the dark side of the moon," a reference to the Pink Floyd album, and meeting rooms in its offices carry the names of rock bands. [1][7] Yang serves as chief executive. [2]
Moonshot AI built its public profile around long-context chat assistants under the Kimi brand. In October 2023 it released its first assistant, which could take in inputs of up to about 200,000 Chinese characters in a single conversation, a figure that stood out at the time. [5][6] In March 2024 the company said Kimi could handle roughly 2 million Chinese characters of context, and the product drew heavy traffic among Chinese users. [6][9] Yang has framed long context as central to the company's strategy, arguing that the ability to process very long inputs without loss is a foundation for useful and personalized AI. [9]
In January 2025 Moonshot released Kimi k1.5, a reasoning oriented model trained with reinforcement learning. The company reported that it matched OpenAI's o1 model on several mathematics, coding, and multimodal reasoning benchmarks. [10] The accompanying technical report described a reinforcement learning approach that used a variant of online policy mirror descent and reached strong results without relying on Monte Carlo tree search, separate value functions, or process reward models. [10] The k1.5 paper is often cited alongside DeepSeek's R1 work as an early public account of training long chain of thought reasoning with comparatively simple reinforcement learning on verifiable rewards. [10]
In July 2025 Moonshot released Kimi K2, a mixture of experts model with about 1 trillion total parameters and roughly 32 billion parameters active per token, trained on around 15.5 trillion tokens and supporting a context window of about 128,000 tokens. [11][12] Kimi K2 was tuned for agentic use, meaning tool calling and multi step task execution, and Moonshot released it under a modified MIT license in base and instruction tuned versions. [11][12] Reviewers reported that the instruction tuned model performed strongly relative to other open weight systems on benchmarks for coding, mathematics, reasoning, and tool use. [11][12] The company later extended the family with a reasoning focused release, Kimi K2 Thinking, in late 2025, and added multimodal capability with Kimi K2.5 in early 2026. [6][13]
Moonshot AI raised capital quickly during 2024. Early reporting described a first round of roughly 60 million US dollars at a valuation near 300 million dollars, with backers that included the venture firms Hongshan, also known as HongShan or formerly Sequoia China, and ZhenFund. [6][14] In February 2024 the company announced a large round, widely reported as around 1 billion dollars and led by Alibaba, which valued Moonshot at about 2.5 billion dollars. [6][14] In August 2024 it raised about 300 million dollars from investors including Tencent and Gaorong Capital, lifting the valuation to roughly 3.3 billion dollars. [5][6] Through 2025 and into 2026, further fundraising discussions reported by financial media pointed to a much higher target valuation, with figures cited around 10 billion dollars and continued participation from Alibaba and Tencent. [5][15]
Moonshot AI is frequently named among a handful of Chinese large model startups that investors and the press call the "AI tigers," a group that also includes companies such as Zhipu AI, MiniMax, Baichuan, and StepFun. [5][6] Within that cohort Moonshot became known for its consumer facing Kimi assistant and, after the open weight release of Kimi K2, for its standing among open large language models. [11][12] The company's heavy backing from Alibaba and Tencent places it within a broader pattern of artificial intelligence development in China in which large technology firms invest in independent model builders. [5][6]
Yang describes Moonshot AI as a company aimed at artificial general intelligence, and he ties that goal to long context. He has argued that, viewed broadly, many AI problems can be cast as problems of using longer and richer context, and that progress in model architectures has tracked increases in the effective context a model can handle. [9][16]
On the question of whether large model gains will continue, Yang has said that the scaling laws remain valid but that the focus is shifting. In a November 2024 interview tied to the launch of the company's K0-math model, he said that scaling does not stop but proceeds through different approaches, with attention moving toward improving reasoning through reinforcement learning rather than only adding computation. [16] He also said there was still room left in pretraining, perhaps for around half a generation to a full generation of models, and that by the measure of distance to AGI the field remained at an early stage. [16] In later public talks he continued to stress long-context capability and agent based systems as directions for reaching more general intelligence. [13]
Yang is widely covered in Chinese and international media as a leading figure among the younger generation of Chinese AI founders, and Moonshot AI's rapid rise to a multi billion dollar valuation has been reported by outlets including the South China Morning Post, Bloomberg, and The Wire China. [5][6][15] His earlier academic work on XLNet and Transformer-XL is frequently noted in these profiles as the basis for his reputation in the field. [2][6]
| Field | Detail |
|---|---|
| Full name | Yang Zhilin (杨植麟) |
| Born | Shantou, Guangdong, China; 1992 (some sources list 1993) |
| Nationality | Chinese |
| Known for | Co-founder and CEO of Moonshot AI; co-author of XLNet and Transformer-XL |
| Education | BS, computer science, Tsinghua University (2015); PhD, Carnegie Mellon University (2019) |
| Doctoral advisors | Ruslan Salakhutdinov and William W. Cohen |
| Notable papers | Transformer-XL (2019); XLNet (2019); HotpotQA |
| Earlier company | Recurrent AI (co-founded 2016) |
| Current role | Co-founder and CEO, Moonshot AI |
| Company founded | 2023, Beijing |
| Main products | Kimi assistant; Kimi k1.5; Kimi K2 |
| Key investors | Alibaba, Tencent, Hongshan, ZhenFund, Gaorong Capital |