PaddlePaddle
Last reviewed
Jun 3, 2026
Sources
22 citations
Review status
Source-backed
Revision
v1 · 2,045 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
22 citations
Review status
Source-backed
Revision
v1 · 2,045 words
Add missing citations, update stale details, or suggest a clearer explanation.
PaddlePaddle (Chinese name Feijiang, 飞桨) is an open-source deep learning platform developed by the Chinese technology company Baidu. The name is a backronym for "PArallel Distributed Deep LEarning." Baidu released the code publicly in 2016, and the project is generally described as the first deep learning framework developed independently in China [1][2]. Beyond a core training and inference framework, PaddlePaddle has grown into a large ecosystem of development kits and model libraries, most notably PaddleOCR, an optical character recognition toolkit that is one of the most popular OCR projects on GitHub. Baidu's ERNIE family of language models is built and trained with PaddlePaddle. The framework is released under the Apache License 2.0 [3].
The internal project that became PaddlePaddle began at Baidu around 2013, when the company's deep learning lab found that single-GPU training tools could not keep up with the growth of its neural network workloads in areas such as search ranking and online advertising. The early system was led by Baidu researcher Wei Xu and was known internally simply as "Paddle" [1][4]. Baidu used it to develop products in advertising, search ranking, large-scale image classification, optical character recognition, and machine translation before opening it up to outside developers [5].
The platform was announced as an open-source project on September 1, 2016, at the Baidu World conference by Andrew Ng, who was then Baidu's chief scientist, and the code was made publicly available later that month [5][6]. The timing placed it among the first wave of company-backed deep learning frameworks, arriving a year after Google open-sourced TensorFlow in 2015 [2].
In its first years PaddlePaddle was primarily a static-graph framework. Baidu later rebuilt large parts of it, and the redesigned core was sometimes referred to as Paddle Fluid. Support for imperative, dynamic-graph programming was added as an option, and with the version 2.0 release in early 2021 the dynamic graph became the default programming mode, alongside a reorganized application programming interface (API) [7][8]. The version 3.0 generation, released in 2025, unified dynamic and static execution under a single automatic-parallelism model and introduced a self-developed compiler. Baidu has continued to hold an annual developer conference, Wave Summit, where it announces new framework versions and reports on the size of the developer community.
The core of PaddlePaddle is a framework for building, training, and deploying neural networks. It exposes a Python API and supports both dynamic graphs (define-by-run, where the computation graph is built as code executes) and static graphs (define-and-run, where a graph is compiled before execution). Since version 2.0 the default is the dynamic-graph mode, which is easier to debug, while a static graph can be recovered for deployment and optimization [7][8].
Distributed and parallel training has been a focus from the beginning, consistent with the "parallel distributed" in the name. The framework supports data parallelism, model parallelism, pipeline parallelism, and combinations of these, which Baidu markets for training very large language models. With the 3.0 release, PaddlePaddle introduced what it calls unified dynamic and static automatic parallelism: a developer annotates how tensors should be split based on a single-device program, and the framework automatically derives an efficient distributed strategy, which is meant to lower the engineering cost of large-model training [9][10].
Version 3.0 also added a deep-learning compiler called CINN (Compiler Infrastructure for Neural Networks). Baidu reported that on NVIDIA A100 hardware, compiled operators ran several times faster than uncompiled ones and that a majority of tested models saw end-to-end speedups, with an average improvement around 27 percent [9][10]. The 3.0 generation further emphasized hardware portability through a unified adaptation layer that Baidu said supported more than 60 chip series, so that the same model code can run across different accelerators [9][10]. The framework runs on CPUs and on GPUs from multiple vendors, and Baidu has worked with partners to support additional accelerators.
The major framework releases are summarized below.
| Version | Date | Notable changes |
|---|---|---|
| PaddlePaddle 1.0 | 2018 | First major stable line; static-graph (Fluid) design [8] |
| PaddlePaddle 1.6 | Wave Summit+ 2019 (Nov 2019) | Large feature update reported as 21 new features, 100+ models, Paddle Lite 2.0 [11] |
| PaddlePaddle 2.0 | Early 2021 | Dynamic graph as default; reorganized API; hundreds of new and revised APIs [7][8] |
| PaddlePaddle 3.0 | April 2025 | Unified dynamic/static automatic parallelism; CINN compiler; 60+ chip series support [9][10] |
| PaddlePaddle 3.2 | Wave Summit 2025 (Sep 9, 2025) | Computational optimization, parallel strategies, native fault tolerance [12] |
| PaddlePaddle 3.3 | January 31, 2026 | Latest release line as of early 2026 [3] |
A large part of PaddlePaddle's appeal is its collection of task-specific development kits and model libraries, which package datasets, pre-trained models, training scripts, and deployment tooling for common application areas. These let developers fine-tune and deploy strong baseline models without building pipelines from scratch [13][14]. The most widely used components are listed below.
| Toolkit | Purpose |
|---|---|
| PaddleNLP | Natural language processing and large language model development, training, and inference [15] |
| PaddleOCR | Optical character recognition and document parsing [16] |
| PaddleClas | Image classification and recognition |
| PaddleDetection | Object detection, instance segmentation, tracking, and keypoint detection [13] |
| PaddleSeg | Image segmentation, with around 20 segmentation models and 50+ pre-trained models [17] |
| PaddleSpeech | Speech recognition and speech synthesis |
| PaddleHelix | Biocomputing for drug discovery and molecular work [13] |
| PARL | Reinforcement learning |
| Paddle Lite | Lightweight inference engine for mobile, embedded, and edge devices [11] |
| Paddle Serving | Server-side model deployment, with 40+ example pipelines [18] |
| PaddleHub | Library of ready-to-use pre-trained models [13] |
| PaddleX | Unified low-code interface across PaddleOCR, PaddleDetection, PaddleClas, PaddleSeg, and others [14] |
Alongside these, Baidu runs AI Studio, an online learning and development platform with notebooks and free compute, and lower-code tools such as EasyDL for users who want to train models without writing much code.
PaddleOCR is the best known toolkit in the ecosystem and one of the most starred OCR projects on GitHub, with on the order of tens of thousands of stars (about 79,000 as of mid-2026) [16]. It became widely adopted after the 2020 release of its PP-OCR series, a family of practical, lightweight recognition models that were small enough to deploy on modest hardware while still handling Chinese and English text. The toolkit supports more than 100 languages [16].
Later versions broadened it from plain text recognition into document understanding. PP-OCRv5 is a single-model pipeline for multilingual mixed text, and PP-StructureV3 adds layout analysis, table recognition, formula recognition, chart understanding, and reading-order recovery, converting complex PDFs and images into Markdown or JSON for downstream use by language models [16]. A 2025 technical report described PaddleOCR 3.0 as combining PP-OCRv5 for multilingual text recognition, PP-StructureV3 for hierarchical document parsing, and PP-ChatOCRv4 for key-information extraction [19]. The project is released under the Apache 2.0 license [16].
PaddlePaddle is the foundation Baidu uses to build and train its ERNIE (Enhanced Representation through kNowledge IntEgration) language models, and the two are marketed together as a combined platform. PaddleNLP, the natural language processing kit, provides the large-model training and inference stack, including Baidu's 4D hybrid parallel strategies, fine-tuning methods, and high-performance inference [15]. For the open-weights ERNIE 4.5 family, Baidu also released ERNIEKit, an industrial development toolkit built on PaddlePaddle that covers pre-training, supervised fine-tuning, LoRA, direct preference optimization, and quantization workflows [15]. Baidu often reports its developer and enterprise numbers for the "PaddlePaddle-ERNIE" ecosystem as a whole rather than for the framework alone.
Baidu reports the size of the PaddlePaddle community at its Wave Summit events, and the figures have grown quickly. By the fourth quarter of 2022, Baidu cited more than 5.35 million developers and over 200,000 enterprises using the platform [20]. At Wave Summit 2023 it reported 8 million developers and around 220,000 enterprises, and by December 2023 the developer count had reached roughly 10.7 million [21]. At Wave Summit 2025, on September 9, 2025, Baidu said the combined PaddlePaddle-ERNIE ecosystem served 23.33 million developers and 760,000 enterprises, with on the order of 1.1 million models created on the platform [12][3]. These numbers come from Baidu and have not been independently audited, but they indicate that PaddlePaddle is the most widely used domestically developed deep learning framework in China.
The project has also gained support outside Baidu. Hardware vendors including NVIDIA and Graphcore have added PaddlePaddle support to their stacks, and pre-trained PaddlePaddle and PaddleNLP models are distributed through the Hugging Face Hub [22].
PaddlePaddle occupies a similar niche to PyTorch and TensorFlow: it is a general-purpose deep learning framework with a Python API, automatic differentiation, GPU acceleration, and tools for training and serving models. Like PyTorch, it now defaults to a dynamic, imperative style that is friendly for research and debugging, while retaining a static-graph path for optimized deployment [7]. Its main points of differentiation are a heavy emphasis on distributed and parallel training for industrial-scale models, a deep adaptation layer for a wide range of Chinese-market AI accelerators, and the unusually broad set of ready-made application toolkits that ship alongside the core library [9][13].
Adoption is the clearest difference in practice. PyTorch and TensorFlow dominate global research and industry usage, whereas PaddlePaddle's user base is concentrated in China, where it is the leading homegrown option and is closely tied to Baidu's cloud services and the ERNIE models. Its documentation and community materials are produced in both Chinese and English, but most of its momentum comes from the Chinese developer ecosystem [2][20].
The PaddlePaddle core framework and its major toolkits, including PaddleOCR and PaddleNLP, are open source under the Apache License 2.0, a permissive license that allows commercial use, modification, and redistribution [3][16]. Development happens in the open on GitHub under the PaddlePaddle organization, where the core repository (PaddlePaddle/Paddle) has well over 20,000 stars and more than a thousand contributors [3].