Apache MXNet
Last reviewed
Sources
14 citations
Review status
Source-backed
Revision
v2 ยท 3,371 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
14 citations
Review status
Source-backed
Revision
v2 ยท 3,371 words
Add missing citations, update stale details, or suggest a clearer explanation.
Apache MXNet (pronounced "mix-net") was an open-source deep learning framework that combined imperative and symbolic execution in one runtime, created around 2015 by the DMLC (Distributed Machine Learning Community) group and adopted by Amazon Web Services as its deep learning framework of choice. It is now retired: the Apache Software Foundation moved MXNet to the Apache Attic in September 2023 after maintainer activity collapsed, leaving the project read-only. MXNet is best remembered for its hybrid programming model, its high-level Gluon API (co-developed with Microsoft in 2017), and for being the framework that PyTorch and TensorFlow ultimately displaced.
MXNet was originally developed in 2015 by researchers in the DMLC group at CMU, NYU, MIT, and the University of Washington. The project was donated to the Apache Software Foundation in January 2017 (entering the Apache Incubator), graduated to an Apache Top-Level Project on September 21, 2022, and was retired to the Apache Attic on September 26, 2023.[1][4][13]
From roughly 2016 through 2022, MXNet was the deep learning framework most actively promoted by Amazon Web Services. Amazon CTO Werner Vogels announced on November 22, 2016 that "MXNet will be our deep learning framework of choice."[5] The framework was used internally at Amazon for product recommendations, search, and computer vision workloads, and shipped as a default option in Amazon SageMaker and the AWS Deep Learning AMIs. Despite that backing, PyTorch and TensorFlow consolidated the research and production market over the late 2010s, and contributor activity to MXNet fell sharply after 2021.
Apache MXNet was a deep learning framework designed for both flexibility and efficiency on heterogeneous, distributed hardware. Its defining idea was that the same library exposed two programming models that shared one execution backend: an imperative, NumPy-like interface (NDArray) for eager experimentation, and a symbolic graph interface (Symbol) for optimized, compiled deployment. A model written imperatively could be "hybridized" into a static graph, an approach later echoed by TorchScript and torch.compile in PyTorch.
The name "MXNet" stood for "mix net," a reference to the mix of imperative (mutable) and declarative (symbolic) programming styles the system supported. The original 2015 paper described MXNet as "a multi-language machine learning (ML) library to ease the development of ML algorithms, especially for deep neural networks," emphasizing portability across CPUs, GPUs, and clusters.[1]
| Field | Detail |
|---|---|
| Original authors | Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, Zheng Zhang |
| Developer | DMLC group; Apache Software Foundation (from 2017) |
| Initial public release | 2015 (first public beta) |
| First paper | December 3, 2015 (arXiv:1512.01274) |
| Apache Incubator entry | January 23, 2017 |
| Apache Top-Level Project | September 21, 2022 |
| Final stable release | 1.9.1 (May 2022) |
| Retirement | September 26, 2023 (moved to Apache Attic) |
| License | Apache License 2.0 |
| Written in | C++, Python, CUDA |
| Repository | github.com/apache/mxnet (archived) |
| Website | mxnet.apache.org (archived) |
MXNet grew out of the work of the DMLC collective, an informal group of graduate students and researchers spread across several universities who collaborated on machine learning systems starting around 2014. The members included Tianqi Chen (then at the University of Washington), Mu Li (Carnegie Mellon University), Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang (then at NYU Shanghai).
Before MXNet, several DMLC members had been working on separate deep learning systems with overlapping aims:
In 2015 the authors agreed to merge the lessons from these projects into a single library that combined a Python-style imperative NDArray API with a symbolic graph compiler. The first commit to the public dmlc/mxnet GitHub repository was made in mid-2015, and a beta release appeared in September of that year.
The original technical paper, "MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems," was posted to arXiv on December 3, 2015 (arXiv:1512.01274).[1] The paper described MXNet's mixed programming interface, its dependency engine that scheduled operations across CPU and GPU devices, and its parameter server design for distributed training. It listed Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang as co-authors.
MXNet's design centered on two complementary programming models that shared the same underlying runtime.
The NDArray API was an imperative interface similar to NumPy, with operations dispatched eagerly across CPU or GPU devices. NDArrays could be allocated on specific contexts (mx.cpu() or mx.gpu(i)), and arithmetic was executed asynchronously by an internal dependency engine.
The Symbol API, in contrast, let users construct a symbolic computation graph, bind it to data, and then optimize and execute it as a single compiled unit. Symbol was the substrate that made MXNet competitive on memory and throughput, because the executor could fuse operators, allocate buffers in advance, and exploit sublinear memory tricks during backpropagation.
A distinctive feature was that NDArray and Symbol shared the same operator registry and execution backend. A model built imperatively with NDArrays could be "hybridized" into a symbolic graph for deployment, an idea that later influenced TorchScript and torch.compile in PyTorch.
MXNet aimed for unusually broad language coverage. Official front ends existed for:
| Language | Notes |
|---|---|
| Python | Primary interface, included Gluon |
| C++ | Native API and the cpp-package higher-level wrapper |
| Scala | Used in JVM-based production systems |
| Java | Wrapper over the Scala API |
| R | Maintained for statisticians and academic users |
| Julia | MXNet.jl package |
| Perl | AI::MXNet on CPAN |
| JavaScript | Browser inference via mxnet.js |
| Clojure | org.apache.clojure-mxnet |
| Go | Community-maintained Go bindings |
The sheer count of bindings reflected the DMLC group's research-machine-learning roots, but it also fragmented documentation effort, and few of these bindings reached the maturity of the Python interface.
Distributed training was built on a key-value store abstraction called KVStore, which combined a parameter server architecture with synchronous and asynchronous gradient updates. KVStore could shard parameters across multiple servers and supported local, device, dist_sync, and dist_async modes. This design drew directly on Mu Li's earlier OSDI 2014 paper on parameter servers, and it allowed MXNet to scale to clusters with hundreds of GPUs.[10]
MXNet was known for aggressive memory optimization. The symbolic executor performed in-place operation fusion when safe, and it implemented a sublinear memory technique for training deep networks (described by Chen, Xu, Zhang, and Guestrin in arXiv:1604.06174, April 2016) that traded extra recomputation during the backward pass for reduced peak memory.[2] This made it possible to train networks with hundreds of layers on commodity GPUs.
MXNet supported automatic differentiation through mx.autograd for the imperative path and through the symbolic graph compiler for the declarative path. Both shared the same set of registered backward functions for operators, so a user could write training code in NumPy-like style and still benefit from optimized gradient computation.
Gluon was a high-level, imperative deep learning API for MXNet, announced jointly by AWS and Microsoft on October 12, 2017 and shipped first in Apache MXNet version 0.11.[6] It resembled PyTorch in feel: users built neural networks using object-oriented Python with eager execution by default, then called hybridize() to compile the model into a static graph for deployment. The API became the recommended entry point for new MXNet users from late 2017 onward.
Gluon was positioned as an accessibility play. Swami Sivasubramanian, then VP of Amazon AI, framed the goal in the launch announcement: "The potential of machine learning can only be realized if it is accessible to all developers."[6] AWS and Microsoft also published Gluon's reference specification so that other engines could implement the interface, with support for Microsoft's Cognitive Toolkit (CNTK) promised in a future release.[6]
A family of toolkits built on Gluon followed:
| Toolkit | Domain | First release |
|---|---|---|
| GluonCV | Computer vision (classification, detection, segmentation, pose) | 2018 |
| GluonNLP | Natural language processing (BERT, transformers, embeddings) | 2018 |
| Gluon-TS | Probabilistic time-series forecasting | 2019 |
| GluonFR | Face recognition | 2019 |
GluonCV shipped pretrained ResNet, MobileNet, YOLOv3, Mask R-CNN, and SimplePose models that were widely used in research benchmarks. Gluon-TS, developed at Amazon, included implementations of DeepAR, DeepState, and Transformer forecasters, and it later survived MXNet's retirement by adding a PyTorch backend.
On January 23, 2017, MXNet entered the Apache Incubator.[13] The proposal listed AWS as the primary sponsor, with Carnegie Mellon University, the University of Washington, NYU Shanghai, Intel, and other groups as supporting champions. The codebase moved from dmlc/mxnet to apache/incubator-mxnet, governance shifted to an Apache-style Project Management Committee, and contributions began to flow under an Apache 2.0 Individual Contributor License Agreement.
During incubation the project added a formal release process, voting procedures for binary artifacts, and a podling community that included contributors from Amazon, Microsoft, Intel, NVIDIA, Samsung, and a long tail of academic institutions.
On September 21, 2022, the Apache Board approved the resolution establishing Apache MXNet as a Top-Level Project, graduating it from the Incubator after more than five years of podling status.[13] The repository moved to apache/mxnet, and the project dropped the "incubating" qualifier. By that point, however, momentum had already shifted toward PyTorch and TensorFlow, and graduation came only about a year before the project was retired.
Amazon was the most visible MXNet user. In November 2016, Werner Vogels published a blog post titled "MXNet: Deep Learning Framework of Choice at AWS," announcing internal adoption and laying out a multi-year investment plan.[5] AWS shipped MXNet as a default in:
Microsoft co-announced Gluon in October 2017 and contributed integration work to bring MXNet into Azure Machine Learning.[6] NVIDIA maintained CUDA and cuDNN integration and shipped MXNet binaries inside the NGC container catalog. Intel contributed the MKL-DNN (later renamed oneDNN) backend, which accelerated MXNet on Xeon and Xeon Phi CPUs.
Other organizations that adopted or integrated MXNet included Wolfram Research (the Mathematica neural network framework used MXNet as a backend), Cisco, Apple, Indeed, and the Vector Institute. The TuSimple autonomous trucking team published several models trained in MXNet during 2017 and 2018.
MXNet ran on a wider range of hardware than most contemporary frameworks. The maintained backends included:
| Hardware | Library | Notes |
|---|---|---|
| NVIDIA GPUs | CUDA, cuDNN, NCCL | Primary GPU target |
| AMD GPUs | ROCm, MIOpen | Community port, partial coverage |
| Intel CPUs | MKL-DNN / oneDNN | Default for x86 inference |
| ARM CPUs | NNPACK, ARM Compute Library | Mobile and edge targets |
| Apple Silicon | Accelerate framework (limited) | Pre-M1 support |
| Custom accelerators | TVM, Treelite | Through compilation rather than direct kernels |
The ARM and edge support, combined with the small size of MXNet's runtime, made it an attractive option for on-device inference between roughly 2017 and 2020. AWS used MXNet inside TVM compilation pipelines for inference at the edge through DeepLens and SageMaker Neo.
A number of well-known model implementations were released first or primarily in MXNet:
MXNet was retired because its contributor base shrank to the point where the project could no longer assemble the quorum needed to ship releases, while PyTorch and TensorFlow captured the research and production markets it had once competed for. From 2018 onward, PyTorch absorbed most academic users, and TensorFlow 2.x with Keras absorbed most production users who wanted Google-cloud integration. MXNet's user base, anchored at AWS, did not grow at the same rate.
Several factors contributed to the decline:
The last stable release, version 1.9.1, was tagged in May 2022. After that, commit activity in apache/mxnet slowed to a trickle, and several attempts to start a 2.x release line stalled in the release-vote stage.
On September 26, 2023, the Apache Software Foundation announced that MXNet had been retired and moved to the Apache Attic, the foundation's repository for inactive projects.[4] The retirement was driven by insufficient maintainer activity and the inability to assemble the quorum needed to cut new releases. The Apache Attic, created in November 2008, exists to provide oversight and a clear end-of-life process once an Apache project can no longer be actively maintained. The codebase remains read-only at github.com/apache/mxnet, and the mxnet.apache.org documentation is now hosted under the Apache Attic site; the move to the Attic was completed in early 2024.[4][13]
| Feature | Apache MXNet | PyTorch | TensorFlow |
|---|---|---|---|
| Initial release | 2015 | September 2016 | November 2015 |
| Primary sponsor | Amazon Web Services | Meta AI (formerly FAIR) | |
| Default execution | Imperative (Gluon) and symbolic (Symbol) | Eager, with torch.compile since 2.0 | Eager (2.x), graph (1.x) |
| Multi-language bindings | Python, C++, R, Scala, Julia, Perl, Java, Clojure, JavaScript, Go | Python, C++ (limited), Java (deprecated), Mobile | Python, C++, JavaScript (TF.js), Java, Go, Swift (retired) |
| Distributed training | KVStore parameter server | DDP, FSDP, RPC | tf.distribute, MultiWorkerMirrored |
| Mobile deployment | MXNet model server, ONNX export | PyTorch Mobile, ExecuTorch | TensorFlow Lite |
| Status (2026) | Retired (Apache Attic, Sept 2023) | Active (Linux Foundation since 2022) | Active (Google) |
| License | Apache 2.0 | Modified BSD | Apache 2.0 |
The core lesson of the comparison is convergence. MXNet bet early on supporting both eager and graph execution in one framework, and by 2023 both surviving rivals had adopted the same hybrid stance (PyTorch through torch.compile, TensorFlow through its 2.x eager-by-default model). MXNet pioneered the design but lost the ecosystem.
Although the framework itself was retired, several MXNet-related projects continued under different stacks:
Several DMLC alumni went on to found or join companies that shaped subsequent waves of AI infrastructure. Tianqi Chen co-founded OctoML (later OctoAI), which NVIDIA acquired in September 2024, and joined the faculty at Carnegie Mellon. Mu Li worked at Amazon on AutoGluon and later moved to Boson AI. Bing Xu and others moved to OpenAI, Anthropic, and other model labs.
The MXNet codebase is preserved in archived form at github.com/apache/mxnet, and the project's documentation remains accessible through the Apache Attic. Researchers studying the early symbolic-versus-eager debate in deep learning frameworks still cite the original 2015 paper as a clear early articulation of the hybrid approach that PyTorch and TensorFlow eventually converged on.[1]
MXNet was a toolbox for building and training neural networks. Its trick was that you could write code in two ways: a quick, line-by-line style that is easy to debug (like playing with LEGO bricks one at a time), and a blueprint style that the computer can compile and run very fast (like handing a finished LEGO instruction booklet to a robot). Amazon liked MXNet and used it for years, but most people ended up choosing PyTorch and TensorFlow instead. With almost no one left to fix bugs and ship updates, the project was officially shut down and frozen in 2023.