StepFun

StepFun (Chinese: 阶跃星辰, pinyin: Jiēyuè Xīngchén), formally known as Shanghai Jieyue Xingchen Intelligent Technology Co., Ltd., is a Chinese artificial intelligence startup headquartered in Shanghai. Founded on April 6, 2023, by Jiang Daxin, a former Global Vice President and Chief Scientist at Microsoft Software Technology Center Asia, the company focuses on building large language models and multimodal AI systems. StepFun is one of China's "Six Little Tigers" (六小虎) of AI, a group of six prominent AI startups that also includes Zhipu AI, Moonshot AI, MiniMax, Baichuan Intelligence, and 01.AI.

Since its founding, StepFun has released a series of foundation models spanning language, vision, video, audio, and multimodal domains. In July 2024, it launched Step-2, which at the time was the first trillion-parameter Mixture of Experts (MoE) language model built by a Chinese startup. The company has raised over $1 billion across multiple funding rounds and was reportedly exploring a Hong Kong IPO as of early 2026.

History

Founding and Early Development (2023)

StepFun was founded on April 6, 2023, in Shanghai's Xuhui District. The catalyst for its creation was the release of ChatGPT by OpenAI in November 2022. According to Jiang Daxin, who spent 16 years at Microsoft before departing, the launch of ChatGPT convinced him that he could build something comparable or better on his own. He recruited two fellow Microsoft alumni to join the venture: Jiao Binxing, who took on responsibility for search-related systems, and Zhu Yibo, who was brought on to lead engineering and infrastructure.

Within two months of starting operations, StepFun trained its first 100-billion-parameter model, Step-1, on what the company described as its first attempt. This rapid development attracted attention from investors, and StepFun became the only company among the "Six Little Tigers" to achieve unicorn status (a valuation exceeding $1 billion) in its initial funding round. Early backers included HongShan (formerly Sequoia China), Qiming Venture Partners, and IDG Capital.

2024: Step-2 and Rapid Model Releases

In March 2024, StepFun began publicly releasing models in its Step series. Over the next 10 months, the company released 11 self-developed foundation models covering language, multimodal understanding, image generation, video generation, and speech.

The most significant release came in July 2024 at the World Artificial Intelligence Conference (WAIC) in Shanghai, where StepFun officially launched three models simultaneously:

Step-2: A trillion-parameter MoE language model, the first of its kind from a Chinese startup.
Step-1.5V: A multimodal understanding model capable of processing text, images, and video.
Step-1X: An image generation model based on a Diffusion Transformer (DiT) architecture.

By November 2024, Step-2 had climbed to the top of the LiveBench benchmark rankings among Chinese models, placing fifth globally. It trailed only models from OpenAI and Google DeepMind, scoring 58.67 in reasoning, 54.86 in data analysis, and 86.57 in instruction following.

2025: Open Source Push and Automotive Partnerships

In February 2025, StepFun and Geely Auto Group jointly announced the open-sourcing of two models: Step-Video-T2V (a 30-billion-parameter text-to-video model) and Step-Audio (a 130-billion-parameter voice interaction framework). Both were released under permissive open-source licenses and made available through the Yuewen app, Hugging Face, and GitHub.

In April 2025, StepFun released Step1X-Edit, an open-source image editing model designed to rival closed-source systems like GPT-4o and Gemini 2 Flash, under the Apache 2.0 license.

In July 2025, at the 2025 WAIC, StepFun and Geely jointly launched Agent OS, a next-generation intelligent cockpit operating system for vehicles. The Geely Galaxy M9 became the first mass-produced vehicle to feature Agent OS, with nearly 40,000 units sold within four months of launch.

Later in 2025, StepFun released Step-3, its next-generation multimodal reasoning model with 321 billion total parameters and 38 billion active parameters. The model introduced two architectural innovations: Multi-Matrix Factorization Attention (MFA), which reduces KV-cache demands to roughly 22% of DeepSeek V3's per-token attention cost, and Attention-FFN Disaggregation (AFD), which decouples attention and feed-forward network layers into specialized subsystems for improved hardware utilization.

2026: Leadership Changes and IPO Exploration

In January 2026, StepFun closed its Series B+ funding round, raising over RMB 5 billion (approximately $700 million). Alongside this milestone, the company appointed Yin Qi as chairman. Yin is the co-founder and former CEO of Megvii (Face++), one of China's first-generation computer vision companies, and also serves as chairman of Qianli Technology, a Geely-backed autonomous driving firm. The appointment reinforced StepFun's strategy of combining AI models with physical-world deployment in devices and vehicles.

In February 2026, StepFun released Step 3.5 Flash, an open-source MoE model with 196 billion total parameters and 11 billion active parameters per token. The model supports a 262K context window and achieves 100 to 300 tokens per second in generation throughput via 3-way Multi-Token Prediction. On elite mathematics benchmarks, Step 3.5 Flash scored 99.8 on AIME 2025 and 98.0 on HMMT 2025.

Also in February 2026, Bloomberg reported that StepFun was exploring a Hong Kong initial public offering that could raise approximately $500 million. Fellow Six Little Tigers members MiniMax and Zhipu AI had already listed on the Hong Kong Stock Exchange by that time.

Leadership

Name	Role	Background
Jiang Daxin (姜大昕)	Co-founder and CEO	PhD in Computer Science from University at Buffalo. Spent 16 years at Microsoft (2007-2023), rising to Global Vice President and Chief Scientist of Microsoft Software Technology Center Asia. Led development of Bing, Cortana, Azure Cognitive Services, and Microsoft 365 NLU systems.
Zhu Yibo (朱逸博)	Co-founder and CTO	PhD from UC Santa Barbara, bachelor's from Tsinghua University. Microsoft Research PhD Fellow (2015). Former Director at ByteDance, where he led AI infrastructure. Previously worked at Google. Specializes in distributed systems and large-scale GPU clusters.
Zhang Xiangyu (张祥雨)	Co-founder and Chief Scientist	Co-author of ResNet ("Deep Residual Learning for Image Recognition"), the most cited paper across all fields published in the 21st century. Co-author of the Helmholtz Prize-winning paper on surpassing human-level ImageNet classification.
Jiao Binxing (焦斌星)	Co-founder and VP	PhD from University of Science and Technology of China (2007-2012). Former Microsoft employee responsible for search-related systems.
Yin Qi (印奇)	Chairman (appointed January 2026)	Co-founder and former CEO of Megvii (Face++). Chairman of Qianli Technology (Geely-backed autonomous driving). Prominent figure from China's first wave of computer vision startups.

Models

StepFun has released a broad portfolio of foundation models across language, vision, video, audio, and multimodal domains. The table below summarizes the major releases.

Model	Release	Type	Parameters	Key Details
Step-1	2023	Language	100B	First model, trained within two months of the company's founding. Dense architecture.
Step-1V	Early 2024	Multimodal	100B+	Multimodal large language model supporting text, image, and video understanding.
Step-1.5V	July 2024	Multimodal	Not disclosed	Upgraded multimodal understanding model, launched at WAIC 2024.
Step-1X	July 2024	Image generation	Not disclosed	Based on DiT (Diffusion Transformer) architecture. Shown at WAIC 2024.
Step-2	July 2024	Language	1T+ (MoE)	First trillion-parameter MoE model from a Chinese startup. Ranked 5th on LiveBench globally (November 2024).
GOT-OCR 2.0	September 2024	OCR	580M	Unified end-to-end OCR model handling plain text, tables, math formulas, sheet music, and more. Open source (Apache 2.0).
Step-Video-T2V	February 2025	Video generation	30B	Text-to-video model generating videos up to 204 frames. 16x16 spatial and 8x temporal compression. Open source (MIT).
Step-Audio	February 2025	Audio/Speech	130B	First production-grade open-source voice interaction framework. Supports multilingual speech, emotional tones, dialects, and rap generation.
Step1X-Edit	April 2025	Image editing	Not disclosed	Open-source image editing model combining Qwen-VL and DiT. Comparable to GPT-4o and Gemini 2 Flash for image editing tasks. Apache 2.0 license.
Step-3	2025	Multimodal reasoning	321B total, 38B active (MoE)	Trained on 20T+ text tokens and 4T image-text tokens. Introduced MFA and AFD architectural innovations. Supports 800K context.
Step-Video-TI2V	March 2025	Image-to-video	Based on T2V	Extension of Step-Video-T2V adding image-conditioned video generation.
Step-Audio-TTS-3B	2025	Text-to-speech	3B	First TTS model capable of generating rap and humming. Trained on large-scale synthetic data.
Step-Audio-EditX	November 2025	Audio editing	3B	LLM-based reinforcement learning model for expressive audio editing (emotion, style, paralinguistics).
NextStep-1	2025	Image generation	14B	Autoregressive image generation using continuous tokens with a 157M-parameter flow-matching head. ICLR 2026 Oral paper.
Step3-VL-10B	2025	Vision-language	10B	Compact model matching or surpassing open-source models 10-20x its size.
Step 3.5 Flash	February 2026	Multimodal reasoning	196B total, 11B active (MoE)	262K context window. 3-way Multi-Token Prediction. AIME 2025: 99.8, HMMT 2025: 98.0. Open source (Apache 2.0).
Step-Audio 2 Mini	August 2025	Speech-to-speech	8B	End-to-end speech conversation model. Reported to surpass GPT-4o-Audio on benchmarks. Open source (Apache 2.0).

Products and Platform

Yuewen (跃问)

Yuewen (also romanized as Yuèwèn, meaning "Leap Ask") is StepFun's consumer-facing AI assistant application, available on iOS, Android, and the web. The app functions as a multimodal chatbot, supporting text-based conversation, document understanding, image and video generation, voice interaction, and task execution. As of early 2026, Yuewen is powered by Step 3.5 Flash and also integrates DeepSeek-R1 for certain reasoning tasks.

Key features include:

Chat: Multi-round conversational AI for questions, writing, and real-time voice interaction.
Yuewen Video: Text-to-video generation powered by Step-Video-T2V.
StepClaw: A task automation agent that executes actions across devices and integrates with enterprise tools like Feishu (Lark).
Document understanding: Summarization and Q&A over uploaded documents.
Image generation and editing: Powered by Step-1X and Step1X-Edit.

StepFun Open Platform

StepFun operates a developer platform at platform.stepfun.com (and platform.stepfun.ai for international access), offering API access to its model family. The platform provides OpenAI-compatible and Anthropic-compatible API endpoints, making integration straightforward for developers already using those ecosystems.

The platform offers tiered subscription plans, starting at $6.99 for individual developers, with a flagship $99 plan that provides rolling limits of 5,000 prompts every five hours. Models are also available through third-party platforms including OpenRouter, NVIDIA NIM, and SiliconFlow.

Funding

StepFun has raised over $1 billion in total funding across multiple rounds since its founding in April 2023.

Round	Date	Amount	Lead Investor(s)	Notable Co-investors
Series A	2023	Not publicly disclosed	HongShan (Sequoia China)	Qiming Venture Partners, IDG Capital
Series B	December 2024	"Several hundred million dollars"	Fortera Capital (Shanghai state-owned Capital Investment Co.)	Tencent, Qiming Venture Partners, Xiaomi
Series B+	January 2026	RMB 5B+ (~$700M)	Multiple (including state-backed funds)	Tencent, 5Y Capital, Qiming Venture Partners, Pudong Venture Capital, China Life Private Equity, Hong Kong Investment Corporation, Shanghai State-owned Capital Investment Leading Fund

StepFun achieved unicorn status (valuation exceeding $1 billion) during its earliest funding rounds, making it among the fastest Chinese AI startups to reach that milestone. The Series B round in December 2024 reportedly valued the company at approximately $2 billion. The Series B+ round in January 2026 was one of the largest single funding rounds by any private AI startup in China.

Strategic Partnerships

Automotive: Geely Auto Group

StepFun's most prominent commercial partnership is with Geely Auto Group, one of China's largest automakers. The relationship is reinforced through chairman Yin Qi, who also leads Qianli Technology, Geely's autonomous driving subsidiary. The two companies jointly developed Agent OS, an intelligent cockpit operating system that integrates StepFun's multimodal models and voice AI. The Geely Galaxy M9 was the first mass-produced vehicle to ship with Agent OS, and StepFun has set a target of surpassing one million vehicle integrations by the end of 2026.

Smartphones

StepFun's models power AI features on smartphones from multiple major Chinese manufacturers, with OPPO among the confirmed partners. As of late 2025, StepFun's models were deployed on over 42 million shipped devices, reaching approximately 20 million daily active users. Partners reportedly account for around 60% of China's leading smartphone brands. Terminal API calls grew nearly 170% quarter-over-quarter for three consecutive quarters through the end of 2025.

Technology and Approach

StepFun has consistently focused on multimodal AI as its core differentiator among the Six Little Tigers. While other members of the group initially concentrated on text-only language models, StepFun invested early in building models that could process and generate text, images, video, and audio within unified architectures.

Scaling Law Philosophy

From its founding, StepFun has been a proponent of the scaling law hypothesis, which holds that model performance improves predictably as model size, training data, and compute increase. Jiang Daxin has stated publicly that Chinese AI can benefit significantly from pursuing bigger models trained on more data, a philosophy reflected in StepFun's progression from the 100-billion-parameter Step-1 to the trillion-parameter Step-2 and beyond.

Architectural Innovations

With Step-3, StepFun introduced two notable architectural innovations:

Multi-Matrix Factorization Attention (MFA): A novel attention mechanism that reduces KV-cache memory requirements and attention FLOPs to approximately 22% of DeepSeek V3's per-token attention cost, while maintaining model expressiveness.
Attention-FFN Disaggregation (AFD): A distributed inference technique that separates attention layers and feed-forward network layers into specialized subsystems, optimizing hardware utilization across different accelerator types.

Step-3 achieves roughly 4,039 tokens per GPU per second under 50ms latency (4K context, FP8), approximately 70% faster than DeepSeek V3's reported throughput of 1,850 tokens per GPU per second.

Open Source Contributions

StepFun has released numerous models under permissive open-source licenses (Apache 2.0 and MIT), including Step-Video-T2V, Step-Audio, Step1X-Edit, GOT-OCR 2.0, Step 3.5 Flash, Step-Audio 2 Mini, NextStep-1, and Step3-VL-10B. The company's GitHub organization (github.com/stepfun-ai) hosts the source code and model weights for these releases.

China's Six Little Tigers

The "Six Little Tigers of Large Models" (大模型六小虎) is a collective designation coined by Chinese media and investors to describe six AI startups that emerged between 2021 and 2024 as the leading domestic challengers to global AI companies like OpenAI and Anthropic. The term is modeled after the "Four Asian Tigers" economic metaphor. All six reached unicorn status by early 2024.

Company	Founded	Founder	Notable Backing	Primary Focus
Zhipu AI	June 2019	Zhang Peng	Tencent, government funds	GLM language models, code generation
MiniMax	2021	Yan Junjie	Alibaba	Consumer chatbots (Talkie), video generation (Hailuo)
Baichuan Intelligence	March 2023	Wang Xiaochuan	Alibaba, Tencent	Open-source language models
Moonshot AI	March 2023	Yang Zhilin	Alibaba	Long-context models, Kimi Chat
StepFun	April 2023	Jiang Daxin	Tencent, HongShan, state funds	Multimodal models, automotive AI
01.AI	July 2023	Kai-Fu Lee	Various	Yi language models

By 2025 and 2026, the trajectories of these companies began to diverge. MiniMax and Zhipu AI pursued public listings on the Hong Kong Stock Exchange, while StepFun explored a potential IPO. Others like Moonshot AI focused on consumer applications. The competitive dynamics within the group are further shaped by the involvement of China's largest technology companies as investors, with Tencent backing both StepFun and Zhipu AI, and Alibaba backing MiniMax and Moonshot AI.

Company Operations

StepFun is headquartered at Lane 315, Fenggu Road, Xuhui District, Shanghai. As of 2025, the company employed approximately 400 people. The workforce is concentrated in research and engineering, reflecting StepFun's identity as a foundation model company. The company's official website is stepfun.com (Chinese) and stepfun.ai (international), while its open-source repositories are hosted at github.com/stepfun-ai.

The company name "阶跃星辰" (Jiēyuè Xīngchén) translates loosely to "Step Stars" or "Stepping Through the Stars," reflecting the company's ambition in AI research. The English brand name "StepFun" combines "Step" (from the step function concept in mathematics and the idea of incremental progress) with "Fun," signaling the company's consumer-facing aspirations alongside its research agenda.

Research Contributions

Beyond its commercial model releases, StepFun has contributed several notable research papers and open-source tools to the broader AI community.

GOT-OCR 2.0

In September 2024, StepFun released GOT-OCR 2.0 (General OCR Theory), a unified end-to-end optical character recognition model with 580 million parameters. Unlike traditional OCR systems that rely on multi-stage pipelines, GOT-OCR 2.0 treats all artificial optical signals as "characters" and processes them through a single model. The system handles plain text, formatted documents, tables, charts, mathematical formulas, molecular structures, geometric shapes, and even sheet music. The model was released under the Apache 2.0 license and is available on Hugging Face.

NextStep-1

NextStep-1 is a 14-billion-parameter autoregressive image generation model that works directly with continuous image tokens rather than quantizing them into discrete visual words. The model uses a causal Transformer backbone with a lightweight 157-million-parameter flow-matching head to predict the next continuous image token. This approach demonstrates that an LLM-style transformer can serve as the primary engine for image generation without relying on vector quantization or heavyweight external diffusion modules. The paper was accepted as an Oral presentation at ICLR 2026. A follow-up version, NextStep-1.1, was released in December 2025 with improved output quality through extended training and a flow-based reinforcement learning post-training paradigm.

Step-DeepResearch

StepFun also released Step-DeepResearch, an open-source deep research agent built on its foundation models, enabling automated multi-step information gathering and synthesis.

Competition

StepFun competes on multiple fronts. Within China, its closest competitors include the other Five Little Tigers as well as established technology companies like Baidu (with its Ernie series), Alibaba (with Qwen), ByteDance (with Doubao/Seed), and DeepSeek. Internationally, StepFun benchmarks its models against those from OpenAI, Anthropic, and Google DeepMind.

StepFun's multimodal focus and automotive partnerships give it a distinct positioning. While most Chinese AI startups initially competed on language model benchmarks, StepFun's early investment in video, audio, and image generation, combined with its deployment in Geely vehicles and smartphones, gives the company a differentiated commercial strategy centered on "AI plus terminal devices."

The competitive landscape has been shaped by several factors. U.S. export controls on advanced AI chips have constrained the compute available to Chinese AI companies, pushing them to develop more efficient training and inference techniques. StepFun's MFA and AFD innovations in Step-3 are partly a response to these hardware constraints. Meanwhile, the entry of DeepSeek as a formidable competitor in early 2025, with its open-source DeepSeek-R1 reasoning model, intensified pressure on all Chinese AI startups to differentiate through product deployment rather than benchmark scores alone.

References

"Shanghai AI start-up founded by ex-Microsoft engineers bets on 'scaling law' to boost AI capabilities." South China Morning Post, June 2024.
"Chinese AI model maker Stepfun raises hundreds of millions in Series B funding." SiliconANGLE, December 26, 2024.
"Chinese AGI Startup 'StepFun' Developed 'Step-2': A New Trillion-Parameter MoE Architecture Model Ranking 5th on Livebench." MarkTechPost, November 20, 2024.
"Chinese AI Unicorn Stepfun Secures $100 Million in New Funding Round." TMTPOST, 2024.
"As AI consolidates, what makes StepFun worth a RMB 5 billion raise?" KrASIA, January 2026.
"Geely-linked tech veteran Yin Qi joins Chinese AI start-up StepFun as chairman." South China Morning Post, January 2026.
"StepFun and Geely Auto open-source large models to global developers." China Daily, February 19, 2025.
"Step3: Cost-Effective Multimodal Intelligence." StepFun Research, 2025.
"StepFun releases Step-Video-T2V: A 300 Billion Parameter Text-to-Video Model." ComfyUI Wiki, February 17, 2025.
"Meet China's top six AI unicorns: who are leading the wave of AI in China." TechNode, January 9, 2025.
"Chinese AI startup StepFun reportedly plans Hong Kong IPO, potentially raising up to $500 million." CnTechPost, February 26, 2026.
"Geely Auto Group Teams Up with StepFun for a Joint Showcase at the 2025 World Artificial Intelligence Conference." BusinessWire, July 2025.
"Step 3.5 Flash: Fast Enough to Think. Reliable Enough to Act." StepFun Blog, February 2026.
"Stepfun's $719M Raise Highlights China's Shift From AI Models to Real-World Deployment." Asia Tech Daily, 2026.
"Six AI tigers." Wikipedia, accessed March 2026.

History

Founding and Early Development (2023)

2024: Step-2 and Rapid Model Releases

2025: Open Source Push and Automotive Partnerships

2026: Leadership Changes and IPO Exploration

Leadership

Models

Products and Platform

Yuewen (跃问)

StepFun Open Platform

Funding

Strategic Partnerships

Automotive: Geely Auto Group

Smartphones

Technology and Approach

Scaling Law Philosophy

Architectural Innovations

Open Source Contributions

China's Six Little Tigers

Company Operations

Research Contributions

GOT-OCR 2.0

NextStep-1

Step-DeepResearch

Competition

References

Improve this article

Related Articles

DeepSeek 3.0

Baidu AI

MiniMax

Moonshot AI

Zhipu AI

ByteDance AI

History

Founding and Early Development (2023)

2024: Step-2 and Rapid Model Releases

2025: Open Source Push and Automotive Partnerships

2026: Leadership Changes and IPO Exploration

Leadership

Models

Products and Platform

Yuewen (跃问)

StepFun Open Platform

Funding

Strategic Partnerships

Automotive: Geely Auto Group

Smartphones

Technology and Approach

Scaling Law Philosophy

Architectural Innovations

Open Source Contributions

China's Six Little Tigers

Company Operations

Research Contributions

GOT-OCR 2.0

NextStep-1

Step-DeepResearch

Competition

References

Related Articles

DeepSeek 3.0

Baidu AI

MiniMax

Moonshot AI

Zhipu AI

ByteDance AI