# StepFun

> Source: https://aiwiki.ai/wiki/stepfun
> Updated: 2026-06-23
> Categories: AI Companies, Chinese AI, Large Language Models
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**StepFun** (Chinese: 阶跃星辰, pinyin: Jiēyuè Xīngchén), formally Shanghai Jieyue Xingchen Intelligent Technology Co., Ltd., is a Shanghai-based Chinese [artificial intelligence](/wiki/artificial_intelligence) startup that builds the "Step" series of [large language models](/wiki/large_language_model) and [multimodal](/wiki/multimodal_model) foundation models. It was founded on April 6, 2023, by Jiang Daxin, a former Global Vice President and Chief Scientist at [Microsoft](/wiki/microsoft) Software Technology Center Asia, and is widely counted among China's "Six Little Tigers" (六小虎) of AI alongside [Zhipu AI](/wiki/zhipu_ai), [Moonshot AI](/wiki/moonshot_ai), [MiniMax](/wiki/minimax), [Baichuan Intelligence](/wiki/baichuan), and [01.AI](/wiki/01_ai). [1][10]

StepFun is best known for launching Step-2 in July 2024, which at the time was the first trillion-parameter [Mixture of Experts](/wiki/mixture_of_experts) (MoE) language model built by a Chinese startup, and for a multimodal-first strategy spanning language, vision, video, and audio. [3] In January 2026 the company closed a Series B+ round of more than RMB 5 billion (about $717 million), the largest single round in China's large-model sector over the preceding 12 months, and appointed Yin Qi, co-founder of [Megvii](/wiki/megvii) (Face++), as chairman. [4][5][6]

## History

### When was StepFun founded?

StepFun was founded on April 6, 2023, in Shanghai's Xuhui District. The catalyst for its creation was the release of [ChatGPT](/wiki/chatgpt) by [OpenAI](/wiki/openai) in November 2022. According to Jiang Daxin, who spent 16 years at Microsoft before departing, the launch of ChatGPT convinced him that he could build something comparable or better on his own. [1] He recruited two fellow Microsoft alumni to join the venture: Jiao Binxing, who took on responsibility for search-related systems, and Zhu Yibo, who was brought on to lead engineering and infrastructure.

Within two months of starting operations, StepFun trained its first 100-billion-parameter model, Step-1, on what the company described as its first attempt. This rapid development attracted attention from investors, and StepFun became the only company among the "Six Little Tigers" to achieve [unicorn](/wiki/unicorn_startup) status (a valuation exceeding $1 billion) in its initial funding round. [10] Early backers included HongShan (formerly Sequoia China), [Qiming Venture Partners](/wiki/qiming_venture_partners), and [IDG Capital](/wiki/idg_capital).

### 2024: Step-2 and Rapid Model Releases

In March 2024, StepFun began publicly releasing models in its Step series. Over the next 10 months, the company released 11 self-developed foundation models covering language, multimodal understanding, image generation, video generation, and speech. [1]

The most significant release came in July 2024 at the World Artificial Intelligence Conference (WAIC) in Shanghai, where StepFun officially launched three models simultaneously:

- **Step-2**: A trillion-parameter MoE language model, the first of its kind from a Chinese startup.
- **Step-1.5V**: A multimodal understanding model capable of processing text, images, and video.
- **Step-1X**: An image generation model based on a Diffusion [Transformer](/wiki/transformer) (DiT) architecture.

By November 2024, the Step-2-16k variant had climbed to the top of the [LiveBench](/wiki/livebench) benchmark rankings among Chinese models, placing fifth globally. It trailed only models from [OpenAI](/wiki/openai) (o1-mini and GPT-4o) and [Google](/wiki/google_deepmind) (Gemini 1.5 Pro), scoring 58.67 in reasoning, 54.86 in data analysis, and 86.57 in instruction following. [3]

### 2025: Open Source Push and Automotive Partnerships

In February 2025, StepFun and [Geely](/wiki/geely) Auto Group jointly announced the open-sourcing of two models: Step-Video-T2V (a 30-billion-parameter text-to-video model) and Step-Audio (a 130-billion-parameter voice interaction framework). [7] Both were released under permissive open-source licenses and made available through the Yuewen app, [Hugging Face](/wiki/hugging_face), and GitHub. In the partnership, Geely focused on scene design and evaluation while StepFun led pretraining. [7]

In April 2025, StepFun released Step1X-Edit, an open-source image editing model designed to rival closed-source systems like [GPT-4o](/wiki/gpt-4) and [Gemini](/wiki/gemini) 2 Flash, under the Apache 2.0 license.

In July 2025, at the 2025 WAIC, StepFun and Geely jointly launched [Agent](/wiki/agent) OS, a next-generation intelligent cockpit operating system for vehicles. [12] The Geely Galaxy M9 became the first mass-produced vehicle to feature Agent OS, with nearly 40,000 units sold within four months of launch.

Later in 2025, StepFun released [Step-3](/wiki/step_3), its next-generation multimodal reasoning model with 321 billion total parameters and 38 billion active parameters. The model introduced two architectural innovations: Multi-Matrix Factorization [Attention](/wiki/attention) (MFA), which reduces KV-cache demands to roughly 22% of [DeepSeek](/wiki/deepseek) V3's per-token attention cost, and Attention-FFN Disaggregation (AFD), which decouples attention and feed-forward network layers into specialized subsystems for improved hardware utilization. [8]

### 2026: Leadership Changes and IPO Exploration

In January 2026, StepFun closed its Series B+ funding round, raising over RMB 5 billion (approximately $717 million). [4] Alongside this milestone, the company appointed Yin Qi as chairman on January 26, 2026. [6] Yin is the co-founder and former CEO of [Megvii](/wiki/megvii) (Face++), one of China's first-generation computer vision companies (one of the "four AI dragons"), and also serves as chairman of Qianli Technology, a Geely-backed autonomous driving firm. [6] The appointment reinforced StepFun's strategy of combining AI models with physical-world deployment in devices and vehicles; the round itself exceeded the Hong Kong IPO proceeds of rivals Zhipu AI (about $535 million) and MiniMax. [4]

In February 2026, StepFun released Step 3.5 Flash, an open-source MoE model with 196 billion total parameters and 11 billion active parameters per token. [13] The model supports a 262K context window and achieves 100 to 300 tokens per second in generation throughput via 3-way Multi-[Token](/wiki/token) Prediction. On elite mathematics benchmarks, Step 3.5 Flash scored 99.8 on [AIME 2025](/wiki/aime_2025) and 98.0 on HMMT 2025. [13]

Also in February 2026, Bloomberg reported that StepFun was exploring a Hong Kong initial public offering that could raise approximately $500 million. [11] Fellow Six Little Tigers members MiniMax and Zhipu AI had already listed on the Hong Kong Stock Exchange by that time. [11]

## Who runs StepFun?

| Name | Role | Background |
|------|------|------------|
| Jiang Daxin (姜大昕) | Co-founder and CEO | PhD in Computer Science from University at Buffalo. Spent 16 years at Microsoft (2007-2023), rising to Global Vice President and Chief Scientist of Microsoft Software Technology Center Asia. Led development of [Bing](/wiki/bing), [Cortana](/wiki/cortana), Azure Cognitive Services, and Microsoft 365 NLU systems. |
| Zhu Yibo (朱逸博) | Co-founder and CTO | PhD from UC Santa Barbara, bachelor's from [Tsinghua University](/wiki/tsinghua_university). Microsoft Research PhD Fellow (2015). Former Director at [ByteDance](/wiki/bytedance_ai), where he led AI infrastructure. Previously worked at Google. Specializes in distributed systems and large-scale GPU clusters. |
| Zhang Xiangyu (张祥雨) | Co-founder and Chief Scientist | Co-author of [ResNet](/wiki/resnet) ("Deep Residual Learning for Image Recognition"), the most cited paper across all fields published in the 21st century. Co-author of the Helmholtz Prize-winning paper on surpassing human-level ImageNet classification. |
| Jiao Binxing (焦斌星) | Co-founder and VP | PhD from University of Science and Technology of China (2007-2012). Former Microsoft employee responsible for search-related systems. |
| Yin Qi (印奇) | Chairman (appointed January 2026) | Co-founder and former CEO of [Megvii](/wiki/megvii) (Face++). Chairman of Qianli Technology (Geely-backed autonomous driving). Prominent figure from China's first wave of computer vision startups. [6] |

## What models has StepFun released?

StepFun has released a broad portfolio of foundation models across language, vision, video, audio, and multimodal domains. The table below summarizes the major releases.

| Model | Release | Type | Parameters | Key Details |
|-------|---------|------|------------|-------------|
| Step-1 | 2023 | Language | 100B | First model, trained within two months of the company's founding. Dense architecture. |
| Step-1V | Early 2024 | Multimodal | 100B+ | Multimodal [large language model](/wiki/large_language_model) supporting text, image, and video understanding. |
| Step-1.5V | July 2024 | Multimodal | Not disclosed | Upgraded multimodal understanding model, launched at WAIC 2024. |
| Step-1X | July 2024 | Image generation | Not disclosed | Based on DiT ([Diffusion Transformer](/wiki/diffusion_models)) architecture. Shown at WAIC 2024. |
| Step-2 | July 2024 | Language | 1T+ (MoE) | First trillion-parameter MoE model from a Chinese startup. Ranked 5th on LiveBench globally (November 2024). [3] |
| GOT-OCR 2.0 | September 2024 | OCR | 580M | Unified end-to-end OCR model handling plain text, tables, math formulas, sheet music, and more. Open source (Apache 2.0). |
| Step-Video-T2V | February 2025 | Video generation | 30B | Text-to-video model generating videos up to 204 frames. 16x16 spatial and 8x temporal compression. Open source (MIT). [7] |
| Step-Audio | February 2025 | Audio/Speech | 130B | First production-grade open-source voice interaction framework. Supports multilingual speech, emotional tones, dialects, and rap generation. [7] |
| Step1X-Edit | April 2025 | Image editing | Not disclosed | Open-source image editing model combining Qwen-VL and DiT. Comparable to GPT-4o and Gemini 2 Flash for image editing tasks. Apache 2.0 license. |
| Step-3 | 2025 | Multimodal reasoning | 321B total, 38B active (MoE) | Trained on 20T+ text tokens and 4T image-text tokens. Introduced MFA and AFD architectural innovations. Supports 800K context. [8] |
| Step-Video-TI2V | March 2025 | Image-to-video | Based on T2V | Extension of Step-Video-T2V adding image-conditioned video generation. |
| Step-Audio-TTS-3B | 2025 | Text-to-speech | 3B | First TTS model capable of generating rap and humming. Trained on large-scale synthetic data. |
| Step-Audio-EditX | November 2025 | Audio editing | 3B | LLM-based reinforcement learning model for expressive audio editing (emotion, style, paralinguistics). |
| NextStep-1 | 2025 | Image generation | 14B | Autoregressive image generation using continuous tokens with a 157M-parameter flow-matching head. ICLR 2026 Oral paper. |
| Step3-VL-10B | 2025 | Vision-language | 10B | Compact model matching or surpassing open-source models 10-20x its size. |
| Step 3.5 Flash | February 2026 | Multimodal reasoning | 196B total, 11B active (MoE) | 262K context window. 3-way Multi-Token Prediction. AIME 2025: 99.8, HMMT 2025: 98.0. Open source (Apache 2.0). [13] |
| Step-Audio 2 Mini | August 2025 | Speech-to-speech | 8B | End-to-end speech conversation model. Reported to surpass GPT-4o-Audio on benchmarks. Open source (Apache 2.0). |

## Products and Platform

### Yuewen (跃问)

Yuewen (also romanized as Yuèwèn, meaning "Leap Ask") is StepFun's consumer-facing AI assistant application, available on iOS, Android, and the web. The app functions as a multimodal chatbot, supporting text-based conversation, document understanding, image and video generation, voice interaction, and task execution. As of early 2026, Yuewen is powered by Step 3.5 Flash and also integrates [DeepSeek-R1](/wiki/deepseek_r1) for certain reasoning tasks.

Key features include:

- **Chat**: Multi-round conversational AI for questions, writing, and real-time voice interaction.
- **Yuewen Video**: Text-to-video generation powered by Step-Video-T2V.
- **StepClaw**: A task automation agent that executes actions across devices and integrates with enterprise tools like Feishu (Lark).
- **Document understanding**: Summarization and Q&A over uploaded documents.
- **Image generation and editing**: Powered by Step-1X and Step1X-Edit.

### StepFun Open Platform

StepFun operates a developer platform at platform.stepfun.com (and platform.stepfun.ai for international access), offering API access to its model family. The platform provides OpenAI-compatible and Anthropic-compatible API endpoints, making integration straightforward for developers already using those ecosystems.

The platform offers tiered subscription plans, starting at $6.99 for individual developers, with a flagship $99 plan that provides rolling limits of 5,000 prompts every five hours. Models are also available through third-party platforms including [OpenRouter](/wiki/openrouter), NVIDIA NIM, and SiliconFlow.

## How is StepFun funded?

StepFun has raised over $1 billion in total funding across multiple rounds since its founding in April 2023.

| Round | Date | Amount | Lead Investor(s) | Notable Co-investors |
|-------|------|--------|-------------------|----------------------|
| Series A | 2023 | Not publicly disclosed | HongShan (Sequoia China) | Qiming Venture Partners, IDG Capital |
| Series B | December 2024 | "Several hundred million dollars" | Fortera Capital (Shanghai state-owned Capital Investment Co.) | [Tencent](/wiki/tencent_ai), Qiming Venture Partners, [Xiaomi](/wiki/xiaomi) |
| Series B+ | January 2026 | RMB 5B+ (~$717M) | Multiple (including state-backed funds) | Tencent, 5Y Capital, Qiming Venture Partners, Pudong Venture Capital, China Life Private Equity, Hong Kong Investment Corporation, Shanghai State-owned Capital Investment Leading Fund |

StepFun achieved unicorn status (valuation exceeding $1 billion) during its earliest funding rounds, making it among the fastest Chinese AI startups to reach that milestone. [10] The Series B round in December 2024 reportedly valued the company at approximately $2 billion. [2] The Series B+ round in January 2026 was one of the largest single funding rounds by any private AI startup in China, and exceeded the Hong Kong IPO proceeds of newly listed rivals Zhipu AI (about $535 million) and MiniMax. [4]

## Strategic Partnerships

### Automotive: Geely Auto Group

StepFun's most prominent commercial partnership is with Geely Auto Group, one of China's largest automakers. The relationship is reinforced through chairman Yin Qi, who also leads Qianli Technology, Geely's autonomous driving subsidiary. [6] The two companies jointly developed Agent OS, an intelligent cockpit operating system that integrates StepFun's multimodal models and voice AI. [12] The Geely Galaxy M9 was the first mass-produced vehicle to ship with Agent OS, and StepFun has set a target of surpassing one million vehicle integrations by the end of 2026.

### Smartphones

StepFun's models power AI features on smartphones from multiple major Chinese manufacturers, with [OPPO](/wiki/oppo) among the confirmed partners. As of late 2025, StepFun's models were deployed on over 42 million shipped devices, reaching approximately 20 million daily active users. [14] Partners reportedly account for around 60% of China's leading smartphone brands. Terminal API calls grew nearly 170% quarter-over-quarter for three consecutive quarters through the end of 2025. [14]

## How does StepFun differ from other Chinese AI startups?

StepFun has consistently focused on multimodal AI as its core differentiator among the Six Little Tigers. While other members of the group initially concentrated on text-only language models, StepFun invested early in building models that could process and generate text, images, video, and audio within unified architectures.

### Scaling Law Philosophy

From its founding, StepFun has been a proponent of the [scaling law](/wiki/scaling_laws) hypothesis, which holds that model performance improves predictably as model size, training data, and compute increase. Jiang Daxin has argued publicly that "Chinese AI can benefit from bigger models, more data," a philosophy reflected in StepFun's progression from the 100-billion-parameter Step-1 to the trillion-parameter Step-2 and beyond. [1]

### Architectural Innovations

With Step-3, StepFun introduced two notable architectural innovations:

- **Multi-Matrix Factorization Attention (MFA)**: A novel attention mechanism that reduces KV-cache memory requirements and attention FLOPs to approximately 22% of DeepSeek V3's per-token attention cost, while maintaining model expressiveness. [8]
- **Attention-FFN Disaggregation (AFD)**: A distributed inference technique that separates attention layers and feed-forward network layers into specialized subsystems, optimizing hardware utilization across different accelerator types. [8]

StepFun reports that Step-3 achieves roughly 4,039 tokens per GPU per second under 50ms latency (4K context, FP8), approximately 70% faster than DeepSeek V3's reported throughput of 1,850 tokens per GPU per second. [8]

### Is StepFun open source?

StepFun has released numerous models under permissive open-source licenses (Apache 2.0 and MIT), including Step-Video-T2V, Step-Audio, Step1X-Edit, GOT-OCR 2.0, Step 3.5 Flash, Step-Audio 2 Mini, NextStep-1, and Step3-VL-10B. [7][13] The company's GitHub organization (github.com/stepfun-ai) hosts the source code and model weights for these releases.

## China's Six Little Tigers

The "Six Little Tigers of Large Models" (大模型六小虎) is a collective designation coined by Chinese media and investors to describe six AI startups that emerged between 2021 and 2024 as the leading domestic challengers to global AI companies like OpenAI and [Anthropic](/wiki/anthropic). The term is modeled after the "Four Asian Tigers" economic metaphor. All six reached unicorn status by early 2024. [10]

| Company | Founded | Founder | Notable Backing | Primary Focus |
|---------|---------|---------|----------------|---------------|
| [Zhipu AI](/wiki/zhipu_ai) | June 2019 | Zhang Peng | [Tencent](/wiki/tencent_ai), government funds | GLM language models, code generation |
| [MiniMax](/wiki/minimax) | 2021 | Yan Junjie | [Alibaba](/wiki/alibaba_cloud) | Consumer chatbots (Talkie), video generation (Hailuo) |
| [Baichuan Intelligence](/wiki/baichuan) | March 2023 | Wang Xiaochuan | Alibaba, Tencent | Open-source language models |
| [Moonshot AI](/wiki/moonshot_ai) | March 2023 | Yang Zhilin | Alibaba | Long-context models, Kimi Chat |
| StepFun | April 2023 | Jiang Daxin | Tencent, HongShan, state funds | Multimodal models, automotive AI |
| [01.AI](/wiki/01_ai) | July 2023 | [Kai-Fu Lee](/wiki/kai_fu_lee) | Various | Yi language models |

By 2025 and 2026, the trajectories of these companies began to diverge. MiniMax and Zhipu AI pursued public listings on the Hong Kong Stock Exchange, while StepFun explored a potential IPO. [11] Others like Moonshot AI focused on consumer applications. The competitive dynamics within the group are further shaped by the involvement of China's largest technology companies as investors, with Tencent backing both StepFun and Zhipu AI, and Alibaba backing MiniMax and Moonshot AI.

## Company Operations

StepFun is headquartered at Lane 315, Fenggu Road, Xuhui District, Shanghai. As of 2025, the company employed approximately 400 people. The workforce is concentrated in research and engineering, reflecting StepFun's identity as a foundation model company. The company's official website is stepfun.com (Chinese) and stepfun.ai (international), while its open-source repositories are hosted at github.com/stepfun-ai.

The company name "阶跃星辰" (Jiēyuè Xīngchén) translates loosely to "Step Stars" or "Stepping Through the Stars," reflecting the company's ambition in AI research. The English brand name "StepFun" combines "Step" (from the step function concept in mathematics and the idea of incremental progress) with "Fun," signaling the company's consumer-facing aspirations alongside its research agenda.

## Research Contributions

Beyond its commercial model releases, StepFun has contributed several notable research papers and open-source tools to the broader AI community.

### GOT-OCR 2.0

In September 2024, StepFun released GOT-OCR 2.0 (General OCR Theory), a unified end-to-end optical character recognition model with 580 million parameters. Unlike traditional OCR systems that rely on multi-stage pipelines, GOT-OCR 2.0 treats all artificial optical signals as "characters" and processes them through a single model. The system handles plain text, formatted documents, tables, charts, mathematical formulas, molecular structures, geometric shapes, and even sheet music. The model was released under the Apache 2.0 license and is available on Hugging Face.

### NextStep-1

NextStep-1 is a 14-billion-parameter autoregressive image generation model that works directly with continuous image tokens rather than quantizing them into discrete visual words. The model uses a causal Transformer backbone with a lightweight 157-million-parameter flow-matching head to predict the next continuous image token. This approach demonstrates that an LLM-style transformer can serve as the primary engine for image generation without relying on vector quantization or heavyweight external diffusion modules. The paper was accepted as an Oral presentation at ICLR 2026. A follow-up version, NextStep-1.1, was released in December 2025 with improved output quality through extended training and a flow-based [reinforcement learning](/wiki/reinforcement_learning) post-training paradigm.

### Step-DeepResearch

StepFun also released Step-DeepResearch, an open-source deep research agent built on its foundation models, enabling automated multi-step information gathering and synthesis.

## Competition

StepFun competes on multiple fronts. Within China, its closest competitors include the other Five Little Tigers as well as established technology companies like [Baidu](/wiki/baidu_ai) (with its Ernie series), Alibaba (with [Qwen](/wiki/qwen)), ByteDance (with [Doubao](/wiki/doubao)/Seed), and [DeepSeek](/wiki/deepseek). Internationally, StepFun benchmarks its models against those from OpenAI, Anthropic, and Google DeepMind.

StepFun's multimodal focus and automotive partnerships give it a distinct positioning. While most Chinese AI startups initially competed on language model benchmarks, StepFun's early investment in video, audio, and image generation, combined with its deployment in Geely vehicles and smartphones, gives the company a differentiated commercial strategy centered on "AI plus terminal devices." [14]

The competitive landscape has been shaped by several factors. U.S. export controls on advanced AI chips have constrained the compute available to Chinese AI companies, pushing them to develop more efficient training and inference techniques. StepFun's MFA and AFD innovations in Step-3 are partly a response to these hardware constraints. [8] Meanwhile, the entry of DeepSeek as a formidable competitor in early 2025, with its open-source DeepSeek-R1 reasoning model, intensified pressure on all Chinese AI startups to differentiate through product deployment rather than benchmark scores alone.

## References

1. "Shanghai AI start-up founded by ex-Microsoft engineers bets on 'scaling law' to boost AI capabilities." South China Morning Post, June 2024.
2. "Chinese AI model maker Stepfun raises hundreds of millions in Series B funding." SiliconANGLE, December 26, 2024.
3. "Chinese AGI Startup 'StepFun' Developed 'Step-2': A New Trillion-Parameter MoE Architecture Model Ranking 5th on Livebench." MarkTechPost, November 20, 2024.
4. "StepFun Raises $717 Million, Outpacing Newly Listed AI Rivals." Caixin Global, January 26, 2026.
5. "As AI consolidates, what makes StepFun worth a RMB 5 billion raise?" KrASIA, January 2026.
6. "Geely-linked tech veteran Yin Qi joins Chinese AI start-up StepFun as chairman." South China Morning Post, January 2026.
7. "StepFun and Geely Auto open-source large models to global developers." China Daily, February 19, 2025.
8. "Step3: Cost-Effective Multimodal Intelligence." StepFun Research, 2025.
9. "StepFun releases Step-Video-T2V: A 300 Billion Parameter Text-to-Video Model." [ComfyUI](/wiki/comfyui) Wiki, February 17, 2025.
10. "Meet China's top six AI unicorns: who are leading the wave of AI in China." TechNode, January 9, 2025.
11. "Chinese AI startup StepFun reportedly plans Hong Kong IPO, potentially raising up to $500 million." CnTechPost, February 26, 2026.
12. "Geely Auto Group Teams Up with StepFun for a Joint Showcase at the 2025 World Artificial Intelligence Conference." BusinessWire, July 2025.
13. "Step 3.5 Flash: Fast Enough to Think. Reliable Enough to Act." StepFun Blog, February 2026.
14. "Stepfun's $719M Raise Highlights China's Shift From AI Models to Real-World Deployment." Asia Tech Daily, 2026.
15. "Six AI tigers." Wikipedia, accessed March 2026.

