Yang Zhilin

Chinese AI Large Language Models People

14 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

20 citations

Revision

v2 · 2,785 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Yang Zhilin (Chinese: 杨植麟; pinyin: Yáng Zhílín) is the co-founder and chief executive officer of Moonshot AI (月之暗面), the Beijing startup that develops the Kimi chatbot and the Kimi family of large language models, including the trillion-parameter Kimi K2. ^[1]^[2]^[5] He founded Moonshot AI in 2023 and is one of the most academically credentialed founders in Chinese AI: he holds a PhD from Carnegie Mellon University and co-authored the influential natural language processing papers Transformer-XL and XLNet. ^[3]^[4] By May 2026 Moonshot had raised roughly 2 billion US dollars at a reported valuation of about 20 billion dollars in a round led by Meituan's investment arm, making Yang's company one of the most valuable independent model builders in China. ^[17]

Moonshot AI is often grouped with a small set of well funded Chinese startups that some investors call the country's "AI tigers," and it has attracted backing from Alibaba, Tencent, and other large technology firms. ^[5]^[6] In his pinyin name the family name is Yang and the given name is Zhilin. English language coverage and his own research papers usually render the name as Zhilin Yang. ^[3]

Who is Yang Zhilin?

Yang Zhilin is a Chinese artificial intelligence researcher and entrepreneur, born in Shantou in Guangdong province. He is best known in two roles: as a researcher who co-authored XLNet and Transformer-XL during his doctorate at Carnegie Mellon, and as the founder and CEO of Moonshot AI, the company behind the Kimi assistant. ^[1]^[3]^[5] As of 2026 he leads Moonshot AI, where he has positioned long-context and agentic models as the company's route toward artificial general intelligence. ^[9]^[16]

Early life and education

Yang was born in Shantou, in China's Guangdong province. Most English language sources give his birth year as 1992, while some Chinese profiles list 1993. ^[1]^[2] As a teenager he was drawn to music and at one point imagined a future as a rock musician, an interest that later shaped how he and his co-founders named their company and its meeting rooms. ^[1]^[7]

He studied at Tsinghua University in Beijing and earned a bachelor's degree in computer science in 2015. ^[2]^[6] At Tsinghua he was advised by the professor Jie Tang, who works on data mining and large scale machine learning. ^[1] During his undergraduate years Yang played drums and wrote songs for a campus rock band called Splay, whose name echoes the splay tree data structure. ^[1]^[7]

After Tsinghua, Yang moved to the United States to pursue doctoral study at the Language Technologies Institute of Carnegie Mellon University. His advisors were Ruslan Salakhutdinov, a machine learning researcher who later led AI research at Apple, and William W. Cohen, a researcher who later joined Google DeepMind. ^[1]^[2] Yang completed the PhD in 2019, in roughly four years rather than the longer span that is typical in the United States, with a thesis titled "Advances in Generative Feature Learning." ^[1]^[6] During his studies he also spent time at industry research labs, including Google Brain and Facebook AI Research. ^[1]^[2]

What did Yang Zhilin research before Moonshot AI?

Yang's academic work centers on representation learning and language modeling. He published more than twenty papers at major machine learning and natural language processing venues, and by the early 2020s Chinese media described him as one of the most cited researchers under the age of 35 in NLP within China. ^[1]^[2] He collaborated with senior researchers including the Turing Award winners Yoshua Bengio and Yann LeCun, and he co-authored the multi-hop question answering dataset HotpotQA. ^[1]^[2]

Transformer-XL

Transformer-XL, introduced in 2019 with Zihang Dai as the lead author and Yang among the co-authors, addressed a limitation of the original Transformer for language modeling. Standard Transformers process text in fixed length segments and cannot easily carry information across segment boundaries, which caps the amount of context a model can use. Transformer-XL added a segment level recurrence mechanism and a relative positional encoding scheme so that hidden states from earlier segments could be reused, letting the model attend to a longer stretch of preceding text. ^[4]^[8] The design improved language modeling results and became a building block for later work on long sequences.

XLNet

XLNet, published in 2019, was authored by Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. ^[3] It proposed a training objective called permutation language modeling. Rather than masking tokens in the style of BERT, XLNet maximizes the expected likelihood over many possible orderings of the input sequence, so that each position can learn from tokens on both its left and its right while keeping an autoregressive formulation. ^[3] Because it does not corrupt the input with mask symbols, XLNet avoids the mismatch between pretraining and fine tuning that affected masked models, and it used Transformer-XL as its backbone to handle longer context. ^[3] On its release XLNet reported gains over BERT across roughly twenty language understanding tasks, including question answering, natural language inference, sentiment analysis, and document ranking. ^[3] The two papers were among the more heavily cited NLP works of that period, and XLNet alone has accumulated well over ten thousand citations. ^[2]^[6]

The research connects directly to the work Yang would later pursue in industry. Both Transformer-XL and XLNet are concerned with extending how much context a model can use, a theme that runs through Moonshot AI's emphasis on long-context systems. ^[9]

Other ventures before Moonshot AI

While still a student, Yang co-founded Recurrent AI in 2016, a company that applied language technology to sales and customer conversation analysis. ^[1] In China he also took part in large model efforts during the early years of the field. Reports describe his involvement around 2020 and 2021 with work tied to Huawei's PanGu models and with the Wu Dao large model project at the Beijing Academy of Artificial Intelligence. ^[1] A dispute later arose between Yang and some early investors connected to Recurrent AI, including the venture firm GSR Ventures, over his move to start a new company. ^[1]^[6]

What is Moonshot AI?

Moonshot AI is the Beijing foundation-model company Yang co-founded in 2023, together with former Tsinghua schoolmates. Several of the co-founders, among them Zhou Xinyu and Wu Yuxin, had backgrounds in machine learning and computer vision, and some had played together in Yang's college band. ^[1]^[7] The company's Chinese name, 月之暗面, translates as "the dark side of the moon," a reference to the Pink Floyd album, and meeting rooms in its offices carry the names of rock bands. ^[1]^[7] Yang serves as chief executive. ^[2]^[5]

Moonshot AI built its public profile around long-context chat assistants under the Kimi brand. In October 2023 it released its first assistant, which could take in inputs of up to about 200,000 Chinese characters in a single conversation, a figure that stood out at the time. ^[5]^[6] In March 2024 the company said Kimi could handle roughly 2 million Chinese characters of context, and the product drew heavy traffic among Chinese users. ^[6]^[9] Yang has framed long context as central to the company's strategy, arguing that the ability to process very long inputs without loss is a foundation for useful and personalized AI. ^[9] In a February 2024 interview with the publication Overseas Unicorn, he put it in stark terms: "If you have a billion-token context length, then the problems we face today would no longer be problems." ^[16]

Kimi k1.5

In January 2025 Moonshot released Kimi k1.5, a reasoning oriented model trained with reinforcement learning. The company reported that it matched OpenAI's o1 model on several mathematics, coding, and multimodal reasoning benchmarks. ^[10] The accompanying technical report described a reinforcement learning approach that used a variant of online policy mirror descent and reached strong results without relying on Monte Carlo tree search, separate value functions, or process reward models. ^[10] The k1.5 paper is often cited alongside DeepSeek's R1 work as an early public account of training long chain of thought reasoning with comparatively simple reinforcement learning on verifiable rewards. ^[10]

What is Kimi K2?

In July 2025 Moonshot released Kimi K2, a mixture of experts model with about 1 trillion total parameters and roughly 32 billion parameters active per token, trained on around 15.5 trillion tokens and supporting a context window of about 128,000 tokens. ^[11]^[12] Kimi K2 was tuned for agentic use, meaning tool calling and multi step task execution, and Moonshot released it under a modified MIT license in base and instruction tuned versions. ^[11]^[12] Reviewers reported that the instruction tuned model performed strongly relative to other open weight systems on benchmarks for coding, mathematics, reasoning, and tool use. ^[11]^[12] The company later extended the family with a reasoning focused release, Kimi K2 Thinking, in late 2025. ^[6]^[13]

Moonshot continued to ship rapidly in 2026. In January 2026 it released Kimi K2.5, a natively multimodal upgrade that added vision capabilities to the trillion-parameter base, alongside a coding agent. ^[18]^[19] In April 2026 it released Kimi K2.6; technology press reported that the model tied GPT-5.5 on the SWE-Bench Pro coding benchmark while costing substantially less per million tokens. ^[20] In March 2026 Yang became, by several accounts, the only founder of a Chinese large model company invited to deliver a keynote at NVIDIA's GTC conference, where he outlined the Kimi roadmap and his case for lossless long context plus agent systems as a path to AGI. ^[13]

How is Moonshot AI funded?

Moonshot AI raised capital quickly during 2024. Early reporting described a first round of roughly 60 million US dollars at a valuation near 300 million dollars, with backers that included the venture firms Hongshan, also known as HongShan or formerly Sequoia China, and ZhenFund. ^[6]^[14] In February 2024 the company announced a large round, widely reported as around 1 billion dollars and led by Alibaba, which valued Moonshot at about 2.5 billion dollars. ^[6]^[14] In August 2024 it raised about 300 million dollars from investors including Tencent and Gaorong Capital, lifting the valuation to roughly 3.3 billion dollars. ^[5]^[6]

Funding accelerated again into 2026. According to Bloomberg, Moonshot closed about 500 million dollars at a 4.3 billion dollar valuation in late 2025, then in early 2026 opened an expanded round in which existing backers including Alibaba, Tencent, and 5Y Capital committed more than 700 million dollars while the company targeted a 10 billion dollar valuation. ^[15]^[17] In May 2026 TechCrunch reported that Moonshot had raised about 2 billion dollars at a valuation of roughly 20 billion dollars, in a round led by Long-Z Investments, the venture arm of the food delivery company Meituan, with participation from Tsinghua Capital, China Mobile, and CPE Yuanfeng. ^[17] The same report said the company's annual recurring revenue topped 200 million dollars in April 2026, driven by paid subscriptions and API usage. ^[17]

The table below summarizes the publicly reported funding milestones, attributed to the sources that disclosed them.

Date	Reported raise	Reported valuation	Lead or notable investors	Source
Early 2024	~60M USD	~300M USD	Hongshan, ZhenFund	The Wire China ^[6]
Feb 2024	~1B USD	~2.5B USD	Alibaba	The Wire China ^[6]
Aug 2024	~300M USD	~3.3B USD	Tencent, Gaorong Capital	Pandaily ^[14]
Late 2025	~500M USD	~4.3B USD	Existing backers	Bloomberg ^[15]
Early 2026	>700M USD (first tranche)	~10B USD target	Alibaba, Tencent, 5Y Capital	Bloomberg ^[15]
May 2026	~2B USD	~20B USD	Long-Z (Meituan), Tsinghua Capital, China Mobile	TechCrunch ^[17]

Where does Moonshot AI rank among Chinese AI labs?

Moonshot AI is frequently named among a handful of Chinese large model startups that investors and the press call the "AI tigers," a group that also includes companies such as Zhipu AI, MiniMax, Baichuan, and StepFun. ^[5]^[6] Within that cohort Moonshot became known for its consumer facing Kimi assistant and, after the open weight release of Kimi K2, for its standing among open large language models. ^[11]^[12] The company's heavy backing from Alibaba, Tencent, and later Meituan places it within a broader pattern of artificial intelligence development in China in which large technology firms invest in independent model builders. ^[5]^[6]^[17]

What are Yang Zhilin's views on AGI and scaling?

Yang describes Moonshot AI as a company aimed at artificial general intelligence, and he ties that goal to long context. He has argued that, viewed broadly, many AI problems can be cast as problems of using longer and richer context, and that progress in model architectures has tracked increases in the effective context a model can handle. ^[9]^[16]

On the question of whether large model gains will continue, Yang has said that the scaling laws remain valid but that the focus is shifting. In a November 2024 interview tied to the launch of the company's K0-math model, he said that scaling does not stop but proceeds through different approaches, with attention moving toward improving reasoning through reinforcement learning rather than only adding computation. ^[16] He also said there was still room left in pretraining, perhaps for around half a generation to a full generation of models, and that by the measure of distance to AGI the field remained at an early stage. ^[16] In later public talks he continued to stress long-context capability and agent based systems as directions for reaching more general intelligence. ^[13]

Recognition

Yang is widely covered in Chinese and international media as a leading figure among the younger generation of Chinese AI founders, and Moonshot AI's rapid rise to a multi billion dollar valuation has been reported by outlets including the South China Morning Post, Bloomberg, TechCrunch, and The Wire China. ^[5]^[6]^[15]^[17] His earlier academic work on XLNet and Transformer-XL is frequently noted in these profiles as the basis for his reputation in the field. ^[2]^[6]

Facts

Field	Detail
Full name	Yang Zhilin (杨植麟)
Born	Shantou, Guangdong, China; 1992 (some sources list 1993)
Nationality	Chinese
Known for	Co-founder and CEO of Moonshot AI; co-author of XLNet and Transformer-XL
Education	BS, computer science, Tsinghua University (2015); PhD, Carnegie Mellon University (2019)
Doctoral advisors	Ruslan Salakhutdinov and William W. Cohen
Notable papers	Transformer-XL (2019); XLNet (2019); HotpotQA
Earlier company	Recurrent AI (co-founded 2016)
Current role	Co-founder and CEO, Moonshot AI (as of 2026)
Company founded	2023, Beijing
Main products	Kimi assistant; Kimi k1.5; Kimi K2; Kimi K2.5; Kimi K2.6
Key investors	Alibaba, Tencent, Long-Z (Meituan), Hongshan, ZhenFund, Gaorong Capital
Reported valuation	~20B USD (May 2026, per TechCrunch)

References

"Yang Zhilin." Wikipedia. https://en.wikipedia.org/wiki/Yang_Zhilin ↩
"杨植麟." Baidu Baike. https://baike.baidu.com/item/%E6%9D%A8%E6%A4%8D%E9%BA%9F/24462023 ↩
Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. "XLNet: Generalized Autoregressive Pretraining for Language Understanding." arXiv:1906.08237, 2019. https://arxiv.org/abs/1906.08237 ↩
zihangdai. "XLNet: Generalized Autoregressive Pretraining for Language Understanding." GitHub. https://github.com/zihangdai/xlnet ↩
"Moonshot AI." Wikipedia. https://en.wikipedia.org/wiki/Moonshot_AI ↩
Berman, Noah. "Who Is Moonshot AI?" The Wire China, 28 September 2025. https://www.thewirechina.com/2025/09/28/who-is-moonshot-ai/ ↩
"Meet Yang Zhilin: Moonshot AI founder builds business in the mould of ByteDance, OpenAI." South China Morning Post, via Yahoo Finance. https://finance.yahoo.com/news/meet-yang-zhilin-moonshot-ai-093000045.html ↩
"XLNet." Hugging Face Transformers documentation. https://huggingface.co/docs/transformers/en/model_doc/xlnet ↩
"Moonshot AI: Betting Big on Long-Context, Confronting the Challenges of Scale and Reliability." Center for Data Innovation, January 2025. https://datainnovation.org/2025/01/moonshot-ai-betting-big-on-long-context-confronting-the-challenges-of-scale-and-reliability/ ↩
"Kimi K1.5." AI Wiki. https://aiwiki.ai/wiki/kimi_k1_5 ↩
"China's Moonshot AI Releases Trillion Parameter Model Kimi K2." HPCwire / AIwire, July 2025. https://www.hpcwire.com/aiwire/2025/07/14/chinas-moonshot-ai-releases-trillion-parameter-model-kimi-k2/ ↩
"Moonshot Releases Kimi K2, a Trillion-Parameter Model Fine-Tuned for Agentic Tool Use." The Batch, DeepLearning.AI. https://www.deeplearning.ai/the-batch/moonshot-releases-kimi-k2-a-trillion-parameter-model-fine-tuned-for-agentic-tool-use/ ↩
"Moonshot AI's GTC Debut: Yang Zhilin Details Kimi K2.5 Tech Roadmap." BigGo Finance. https://finance.biggo.com/news/WZHQBJ0ByH9TLH69H_BN ↩
"Moonshot AI Secures $300 Million in Latest Funding Round, Boosting Valuation to $3.3 Billion." Pandaily. https://pandaily.com/moonshot-ai-secures-300-million-in-latest-funding-round-boosting-valuation-to-3-3-billion/ ↩
"China AI Startup Moonshot Seeks $10 Billion Value in New Funding." Bloomberg, 17 February 2026. https://www.bloomberg.com/news/articles/2026-02-17/china-ai-startup-moonshot-seeks-10-billion-value-in-new-funding ↩
"Scaling law remains valid, but priorities are shifting, says Moonshot AI founder." KrASIA, November 2024. https://kr-asia.com/scaling-law-remains-valid-but-priorities-are-shifting-says-moonshot-ai-founder ↩
"China's Moonshot AI raises $2B at $20B valuation as demand for open source AI skyrockets." TechCrunch, 7 May 2026. https://techcrunch.com/2026/05/07/chinas-moonshot-ai-raises-2b-at-20b-valuation-as-demand-for-open-source-ai-skyrockets/ ↩
"China's Moonshot releases a new open source model Kimi K2.5 and a coding agent." TechCrunch, 27 January 2026. https://techcrunch.com/2026/01/27/chinas-moonshot-releases-a-new-open-source-model-kimi-k2-5-and-a-coding-agent/ ↩
"Moonshot AI Releases Kimi K2.5 - 1T Parameter Native Multimodal Agent Model." ComfyUI Wiki, 27 January 2026. https://comfyui-wiki.com/en/news/2026-01-27-moonshot-ai-kimi-k2-5-release ↩
"Kimi K2.6 Explained: Moonshot AI's Open-Source Model That Ties GPT-5.5 on Coding." Miraflow, April 2026. https://miraflow.ai/blog/kimi-k2-6-explained-moonshot-ai-open-source-model-ties-gpt-5-5-coding ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Artificial intelligence in China Kimi K1.5 Tsinghua University

Who is Yang Zhilin?

Early life and education

What did Yang Zhilin research before Moonshot AI?

Transformer-XL

XLNet

Other ventures before Moonshot AI

What is Moonshot AI?

Kimi k1.5

What is Kimi K2?

How is Moonshot AI funded?

Where does Moonshot AI rank among Chinese AI labs?

What are Yang Zhilin's views on AGI and scaling?

Recognition

Facts

References

Improve this article

Related Articles

Tang Jie

Liang Wenfeng

Kai-Fu Lee

Jian Sun

Robin Li

Liang Rubo

What links here

Related Articles

Tang Jie

Liang Wenfeng

Kai-Fu Lee

Jian Sun

Robin Li

Liang Rubo

What links here