ModelScope is an open-source Model-as-a-Service (MaaS) platform developed by Alibaba Cloud and DAMO Academy, Alibaba's global research initiative. Often described as China's equivalent to Hugging Face, ModelScope provides a centralized hub where developers and researchers can discover, test, fine-tune, and deploy artificial intelligence models across a wide range of domains. Since its public launch in November 2022, the platform has grown into China's largest AI open-source community, hosting over 70,000 models and serving more than 16 million developers across 36 countries.
ModelScope operates as a joint initiative between Alibaba Cloud and the Open Source Development Committee of the Chinese Computer Society. The platform's stated mission is to lower the barrier of entry for AI development by making state-of-the-art models accessible to everyone, from individual researchers and university students to enterprise engineering teams building production applications.
ModelScope was officially unveiled on November 3, 2022, at Alibaba Cloud's annual Apsara Conference (also known as the Yunqi Conference) in Hangzhou, China. The Apsara Conference is Alibaba Cloud's flagship technology summit, drawing tens of thousands of attendees each year. At launch, the platform featured over 300 ready-to-deploy AI models developed by DAMO Academy over the preceding five years. Of those initial models, more than 150 were recognized as state-of-the-art (SOTA) in their respective fields. Chinese-language models accounted for more than one-third of the launch catalog, covering over 60 distinct tasks.
Among the headline models at launch were Tongyi, a 5-billion-parameter text-to-image model, and OFA (One-For-All), a 6-billion-parameter cross-modal pre-trained model capable of image captioning, visual question answering, and other multimodal tasks. Both had been developed internally at DAMO Academy.
Jeff Zhang, then President of Alibaba Cloud Intelligence, described the launch as part of a broader effort to democratize access to AI capabilities. He stated that cloud computing had "given rise to a fundamental revolution in the way computing resources are organized, produced and put to commercial use." The platform was initially available only in Chinese at modelscope.cn.
By April 2023, ModelScope had attracted over 1 million developer users. Growth accelerated through the year as interest in large language models surged globally and within China in particular. By August 2023, the platform counted over 2 million developers, with the total number of hosted models exceeding 2,300 and cumulative model downloads surpassing 100 million. This period coincided with the rapid rise of Chinese LLMs such as Qwen, ChatGLM, and Baichuan, many of which chose ModelScope as a primary distribution channel.
In June 2024, Alibaba Cloud announced the English-language version of ModelScope at the CVPR Conference in Seattle, Washington. Available at modelscope.ai, the international edition gave developers worldwide access to over 5,000 ready-to-use AI models from Alibaba Cloud and prominent Chinese AI startups such as Baichuan, Zhipu AI, and others. The English platform also offered access to more than 1,500 high-quality Chinese-language datasets and an extensive range of toolkits for data processing.
The decision to launch an English version reflected both the growing international interest in Chinese AI models and Alibaba Cloud's desire to compete globally with platforms like Hugging Face. This expansion coincided with Alibaba Cloud's broader internationalization push, which included plans for new cloud regions and data centers across Mexico, Malaysia, the Philippines, Thailand, and South Korea.
By mid-2025, ModelScope had grown to host over 70,000 models and serve more than 16 million developers in 36 countries. The platform's growth from 300 models at launch to 70,000 in under three years illustrates the rapid pace of open-source AI development in China. In April 2025, the platform launched MCP Plaza, which rapidly became the largest Model Context Protocol community in China with over 4,000 online services and call volumes exceeding 100 million.
During the 2025 World Artificial Intelligence Conference, Alibaba used the platform to showcase new open-source releases including Qwen3-Coder, an advanced coding model, alongside other reasoning models in the Qwen family.
The core of ModelScope is its model hub, which functions as a searchable repository of pre-trained models. Each model listing includes a model card with documentation, usage instructions, performance benchmarks, and licensing information. Model cards follow a structured format that includes the model's task type, framework compatibility (e.g., PyTorch, TensorFlow), parameter count, and recommended hardware requirements. Models span multiple domains:
| Domain | Example Tasks |
|---|---|
| Natural Language Processing | Text generation, text classification, word segmentation, named entity recognition, machine translation, sentiment analysis, punctuation prediction |
| Computer Vision | Image classification, object detection, face detection, portrait matting, image inpainting, OCR, depth estimation |
| Speech and Audio | Automatic speech recognition, text-to-speech, speaker verification, voice activity detection, speech enhancement |
| Multimodal AI | Vision-language models, image captioning, visual question answering, text-to-image generation, text-to-video generation |
| Scientific Computing | Protein structure prediction, molecular generation, drug discovery |
Developers can test models directly in the browser for free and receive results within minutes. They can then fine-tune models to create customized AI applications, running them on Alibaba Cloud infrastructure, other cloud platforms, or locally on their own hardware.
ModelScope hosts a dedicated dataset repository with thousands of datasets spanning multiple languages and domains. As of the English version launch in 2024, the platform offered over 1,500 high-quality Chinese-language datasets alongside datasets in other languages. Datasets are versioned and documented with metadata describing their size, format, license, and intended use case, making them straightforward to discover and integrate into training workflows. The dataset hub supports standard formats and provides download utilities through both the web interface and the Python library.
Similar to Hugging Face Spaces, ModelScope provides a hosting environment for interactive model demos. Developers can build and deploy web applications using Gradio or Streamlit to showcase their models. These Spaces are publicly accessible and shareable via URL, allowing researchers to demonstrate their work without requiring end users to install anything locally.
ModelScope also maintains modelscope-studio, a third-party component library built on top of Gradio that integrates Ant Design, Ant Design X, Monaco Editor, and other advanced UI components for building richer demo applications. This library enables developers to create more polished interfaces that go beyond the standard Gradio widget set.
Launched on April 15, 2025, MCP Plaza is ModelScope's community hub for Model Context Protocol (MCP) services. MCP is a standardized protocol for connecting AI models with external tools and data sources, and MCP Plaza aggregates nearly 1,500 MCP services spanning categories such as search, maps, file systems, and developer tools. Notable integrations include services from Alipay (enabling AI-driven transaction creation, inquiry, and refunds) and MiniMax (packaging speech generation, speech cloning, image generation, and video generation into MCP-compatible endpoints).
The platform includes two key developer tools: MCP Sandbox, which allows developers to set up and test MCP services within a minute with support for both cloud hosting and local deployment, and MCPBench, an open-source evaluation tool for assessing MCP service effectiveness, efficiency, and token consumption. By mid-2025, total MCP service call volume on the platform had exceeded 100 million.
ModelScope developed EvalScope, a comprehensive model evaluation framework for benchmarking large language models, vision-language models, embedding models, rerankers, and AIGC systems. EvalScope includes built-in support for industry-standard benchmarks such as MMLU, C-Eval, GSM8K, ARC, GPQA-Diamond, MATH-500, AIME24, PolyMath, SimpleVQA, and many others. The framework integrates multiple evaluation backends including OpenCompass, VLMEvalKit, and RAGEval, and provides a WebUI for interactive visualization and multi-model comparison. Arena mode allows pairwise model battles for intuitive ranking. It also supports performance stress testing, measuring metrics like time-to-first-token (TTFT) and tokens-per-output-token (TPOT). EvalScope has accumulated over 2,600 GitHub stars.
ModelScope serves as a primary or secondary distribution channel for many of the most prominent open-source AI models developed in China. The platform also mirrors popular international models, making them accessible to Chinese developers without requiring a VPN.
| Model Family | Developer | Description |
|---|---|---|
| Qwen | Alibaba Cloud | Alibaba's flagship LLM series, including Qwen3, Qwen3-VL, Qwen3-Omni, and Qwen3-Coder |
| DeepSeek | DeepSeek | High-performance reasoning models including DeepSeek-R1 and DeepSeek-V3 |
| ChatGLM | Zhipu AI | Bilingual (Chinese-English) conversational models including GLM-4 and GLM-5 |
| Baichuan | Baichuan Inc. | Chinese-optimized large language models |
| Yi | 01.AI | Bilingual models developed by Kai-Fu Lee's AI lab |
| InternLM | Shanghai AI Lab | Research-oriented multilingual LLMs including InternLM3 |
| Stable Diffusion | Stability AI | Popular open-source image generation models including SDXL |
| Llama | Meta AI | Meta's open-weight large language models including Llama 4 |
| Mistral | Mistral AI | European open-weight language models |
| Tongyi | Alibaba DAMO Academy | Early multimodal models including text-to-image (5 billion parameters) |
| OFA (One-For-All) | Alibaba DAMO Academy | Cross-modal pre-trained model (6 billion parameters) for image captioning and visual QA |
Many Chinese AI labs publish their models on both ModelScope and Hugging Face simultaneously. This dual-publishing strategy allows labs to reach both Chinese developers (who may not have access to Hugging Face) and international developers (who may not be familiar with ModelScope).
The ModelScope Python library provides a programmatic interface for interacting with models hosted on the platform. It supports Python 3.7 and above and is available via pip.
The core library can be installed with:
pip install modelscope
Domain-specific extras are available for specialized use cases:
pip install modelscope[nlp] # NLP-specific dependencies
pip install modelscope[cv] # Computer vision dependencies
pip install modelscope[audio] # Audio processing dependencies
Docker images with pre-configured CPU and GPU environments are also available for developers who prefer containerized setups.
The library offers a unified interface across all supported domains:
pipeline interface allows running model inference in as few as three lines of code. Developers specify a task and model identifier, and the library handles downloading, caching, and execution automatically.Trainer abstraction enables model fine-tuning with approximately 10 lines of code, supporting distributed training with data parallelism, model parallelism, and hybrid strategies.The core ModelScope library is licensed under the Apache License 2.0 and has received approximately 8,800 stars on GitHub. It is actively maintained with regular releases.
Beyond the core platform and library, ModelScope maintains a growing ecosystem of open-source tools that cover the full lifecycle of AI model development:
| Project | Description | GitHub Stars |
|---|---|---|
| ms-swift (SWIFT) | Scalable framework for fine-tuning 600+ LLMs and 300+ multimodal LLMs, supporting SFT, DPO, GRPO, and other training methods. Accepted at AAAI 2025. | 12,000+ |
| EvalScope | Model evaluation and benchmarking framework for LLMs, VLMs, and AIGC systems | 2,600+ |
| DiffSynth-Studio | Diffusion model engine for image and video generation, supporting FLUX, Stable Diffusion, ControlNet, and more | 7,000+ |
| FunASR | End-to-end speech recognition toolkit with SOTA pre-trained models including Paraformer | 8,000+ |
| MS-Agent | Lightweight agentic framework for building customizable AI agent systems with tool use and deep research capabilities | 2,000+ |
| AgentScope | Multi-agent framework for building distributed agent applications with visibility and trust | 5,000+ |
| ClearerVoice-Studio | Speech enhancement, separation, and target speaker extraction toolkit | 3,000+ |
| 3D-Speaker | Speaker verification, recognition, and diarization toolkit | 1,000+ |
| MCPBench | Evaluation benchmark for MCP servers | 500+ |
SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning) is one of ModelScope's most widely adopted tools. It provides a complete lifecycle for model customization, from continual pre-training through supervised fine-tuning and human alignment to deployment. SWIFT supports training techniques including LoRA, QLoRA, full-parameter fine-tuning, and reinforcement learning algorithms such as GRPO, DAPO, GSPO, RLOO, and Reinforce++. It integrates Megatron parallelism techniques (tensor parallelism, pipeline parallelism, context parallelism, expert parallelism) to accelerate training on large clusters. Megatron-based training of Qwen3 MoE models, for instance, achieves speeds up to 10 times faster than the standard transformers library. The SWIFT paper was accepted at the AAAI 2025 conference.
DiffSynth-Studio is ModelScope's open-source diffusion model engine, designed for image and video generation research and production. It supports multiple model families including FLUX, Stable Diffusion, ControlNet, LTX-2, and Qwen-Image, and provides training paradigms such as full parameter fine-tuning, LoRA adaptation, differential LoRA, and direct distillation. The framework includes ExVideo, a post-training technique that extends video generation capabilities to produce sequences of up to 128 frames.
MS-Agent (formerly ModelScope-Agent) is a framework for building customizable AI agent systems using open-source large language models as controllers. It supports tool-use data collection, tool retrieval, tool registration, memory control, and customized model training. Recent versions include Agentic Insight, a deep research system for multi-step information gathering and analysis, and integration with MCP services.
ModelScope is frequently compared to Hugging Face given their similar roles as model hosting platforms. The following table summarizes key differences:
| Feature | ModelScope | Hugging Face |
|---|---|---|
| Operator | Alibaba Cloud / DAMO Academy | Hugging Face Inc. (independent) |
| Headquarters | Hangzhou, China | New York, USA |
| Launch Year | 2022 | 2016 (model hub launched ~2019) |
| Total Models | 70,000+ (as of mid-2025) | 2,000,000+ (as of mid-2025) |
| Total Users | 16 million+ | 13 million+ |
| Primary Language | Chinese (English version since 2024) | English |
| Regional Strength | China and Asia; faster download speeds in mainland China | Global; strongest in North America and Europe |
| Datasets | Thousands, with strong Chinese-language coverage | 500,000+ |
| Spaces / Demos | Supported (Gradio, Streamlit) | Supported (Gradio, Streamlit, Docker) |
| Python Library | modelscope (pip installable) | transformers, huggingface_hub (pip installable) |
| Model Evaluation | EvalScope (built-in) | Open LLM Leaderboard, third-party integrations |
| Fine-Tuning Tools | ms-swift (SWIFT) | PEFT, TRL, AutoTrain |
| Agent Framework | MS-Agent, AgentScope | Transformers Agents, smolagents |
| License | Apache 2.0 (platform code) | Apache 2.0 (library code) |
| Cloud Integration | Deep integration with Alibaba Cloud | Partnerships with AWS, Google Cloud, Azure |
| Access in China | Full speed, no restrictions | Blocked without VPN |
A key practical consideration for Chinese developers is network access. Hugging Face has been inaccessible in mainland China without a VPN since restrictions imposed by the Cyberspace Administration of China. ModelScope offers significantly faster download speeds within China and serves as the default model source for several Chinese AI frameworks. For example, the Xinference inference engine automatically switches its download source to ModelScope when the system language is set to Simplified Chinese.
On the other hand, Hugging Face has a much larger global model catalog (roughly 30 times more models) and a more internationally diverse contributor base. Hugging Face also benefits from a larger dataset repository and more mature integrations with Western cloud providers. Many Chinese AI labs, including those behind Qwen, DeepSeek, and ChatGLM, publish their models on both platforms simultaneously to reach the widest possible audience.
ModelScope is deeply intertwined with Alibaba Cloud's broader AI strategy. While the platform itself is open-source and free to use, it fits within Alibaba Cloud's commercial ecosystem in several ways:
ModelScope follows a modular architecture designed around the MaaS concept. The platform consists of several interconnected components:
The Python library uses an abstraction layer that separates model logic from infrastructure concerns. This means the same code can run locally on a developer's machine, on Alibaba Cloud, or on any other compute platform with minimal changes.
ModelScope is governed as a joint initiative between Alibaba Cloud and the Open Source Development Committee of the Chinese Computer Society (CCF). This partnership gives the platform institutional backing from both the commercial and academic sectors in China. The CCF is one of China's most prominent computing academic organizations, and its involvement lends credibility to ModelScope's role as a neutral community resource rather than a purely corporate product.
The community contributes models, datasets, and tools through the platform's web interface and Git-based workflows. As of mid-2025, the platform serves developers in 36 countries, though the majority of its user base remains in mainland China.
ModelScope has also fostered sub-communities around specific AI domains. FunASR, for example, has built a dedicated community of speech recognition researchers, while DiffSynth-Studio has attracted contributors working on generative image and video models. The ms-swift community has become particularly active, with researchers sharing fine-tuning recipes and best practices for adapting large language models to specific tasks.
ModelScope occupies a unique position in the global AI ecosystem. As the largest open-source AI model community in China, it plays a critical role in distributing Chinese AI research to the broader developer community. The platform's rapid growth from 300 models at launch to over 70,000 in less than three years reflects the explosive expansion of open-source AI development in China.
The platform has also become important infrastructure for China's AI supply chain. With Hugging Face access restricted in mainland China, ModelScope serves as the primary channel through which Chinese developers access both domestic and international open-source models. This has made it a central node in the Chinese AI ecosystem, connecting model developers, cloud infrastructure providers, and downstream application builders.
From a global perspective, ModelScope represents the emergence of a parallel open-source AI ecosystem centered in China. While Hugging Face remains the dominant platform internationally, ModelScope's user count (16 million+) actually exceeds Hugging Face's (13 million+), highlighting the scale of AI development activity in China. The coexistence of these two major platforms reflects a broader pattern in the AI industry: the development of distinct but overlapping ecosystems in the West and in China, connected by dual-publication of models and cross-platform compatibility.