# Apache MXNet

> Source: https://aiwiki.ai/wiki/mxnet
> Updated: 2026-06-25
> Categories: AI Infrastructure, Artificial Intelligence, Developer Tools, Open Source AI
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Apache MXNet** (pronounced "mix-net") was an open-source [deep learning](/wiki/deep_learning) framework that combined imperative and symbolic execution in one runtime, created around 2015 by the [DMLC](/wiki/dmlc) (Distributed Machine Learning Community) group and adopted by [Amazon Web Services](/wiki/aws) as its deep learning framework of choice. It is now retired: the [Apache Software Foundation](/wiki/apache_software_foundation) moved MXNet to the [Apache Attic](/wiki/apache_attic) in September 2023 after maintainer activity collapsed, leaving the project read-only. MXNet is best remembered for its hybrid programming model, its high-level [Gluon](/wiki/gluon) API (co-developed with Microsoft in 2017), and for being the framework that [PyTorch](/wiki/pytorch) and [TensorFlow](/wiki/tensorflow) ultimately displaced.

MXNet was originally developed in 2015 by researchers in the DMLC group at [CMU](https://www.cmu.edu/), [NYU](https://www.nyu.edu/), [MIT](https://www.mit.edu/), and the University of Washington. The project was donated to the Apache Software Foundation in January 2017 (entering the Apache Incubator), graduated to an Apache Top-Level Project on September 21, 2022, and was retired to the Apache Attic on September 26, 2023.[1][4][13]

From roughly 2016 through 2022, MXNet was the deep learning framework most actively promoted by Amazon Web Services. Amazon CTO Werner Vogels announced on November 22, 2016 that "MXNet will be our deep learning framework of choice."[5] The framework was used internally at Amazon for product recommendations, search, and computer vision workloads, and shipped as a default option in [Amazon SageMaker](/wiki/sagemaker) and the AWS Deep Learning AMIs. Despite that backing, PyTorch and TensorFlow consolidated the research and production market over the late 2010s, and contributor activity to MXNet fell sharply after 2021.

## What is Apache MXNet?

Apache MXNet was a deep learning framework designed for both flexibility and efficiency on heterogeneous, distributed hardware. Its defining idea was that the same library exposed two programming models that shared one execution backend: an imperative, NumPy-like interface (NDArray) for eager experimentation, and a symbolic graph interface (Symbol) for optimized, compiled deployment. A model written imperatively could be "hybridized" into a static graph, an approach later echoed by TorchScript and `torch.compile` in PyTorch.

The name "MXNet" stood for "mix net," a reference to the mix of imperative (mutable) and declarative (symbolic) programming styles the system supported. The original 2015 paper described MXNet as "a multi-language machine learning (ML) library to ease the development of ML algorithms, especially for deep neural networks," emphasizing portability across CPUs, GPUs, and clusters.[1]

## Facts

| Field | Detail |
| --- | --- |
| Original authors | Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, Zheng Zhang |
| Developer | DMLC group; Apache Software Foundation (from 2017) |
| Initial public release | 2015 (first public beta) |
| First paper | December 3, 2015 (arXiv:1512.01274) |
| Apache Incubator entry | January 23, 2017 |
| Apache Top-Level Project | September 21, 2022 |
| Final stable release | 1.9.1 (May 2022) |
| Retirement | September 26, 2023 (moved to Apache Attic) |
| License | Apache License 2.0 |
| Written in | C++, Python, CUDA |
| Repository | github.com/apache/mxnet (archived) |
| Website | mxnet.apache.org (archived) |

## When was MXNet created, and who built it?

MXNet grew out of the work of the [DMLC](/wiki/dmlc) collective, an informal group of graduate students and researchers spread across several universities who collaborated on machine learning systems starting around 2014. The members included [Tianqi Chen](/wiki/tianqi_chen) (then at the University of Washington), [Mu Li](/wiki/mu_li) (Carnegie Mellon University), Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang (then at NYU Shanghai).

Before MXNet, several DMLC members had been working on separate deep learning systems with overlapping aims:

- **CXXNet**, a C++ deep learning library written largely by Bing Xu and Tianqi Chen, focused on configuration-driven training of convolutional networks.
- **Minerva**, a system from NYU led by Zheng Zhang and Minjie Wang, that explored dynamic dataflow execution and lazy evaluation of N-dimensional arrays.
- **Purine**, an early deep learning framework that experimented with bipartite computation graphs.

In 2015 the authors agreed to merge the lessons from these projects into a single library that combined a Python-style imperative NDArray API with a symbolic graph compiler. The first commit to the public `dmlc/mxnet` GitHub repository was made in mid-2015, and a beta release appeared in September of that year.

The original technical paper, "MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems," was posted to arXiv on December 3, 2015 (arXiv:1512.01274).[1] The paper described MXNet's mixed programming interface, its dependency engine that scheduled operations across CPU and GPU devices, and its parameter server design for distributed training. It listed Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang as co-authors.

## How did MXNet work?

MXNet's design centered on two complementary programming models that shared the same underlying runtime.

### NDArray and Symbol

The **NDArray** API was an imperative interface similar to NumPy, with operations dispatched eagerly across CPU or GPU devices. NDArrays could be allocated on specific contexts (`mx.cpu()` or `mx.gpu(i)`), and arithmetic was executed asynchronously by an internal dependency engine.

The **Symbol** API, in contrast, let users construct a symbolic computation graph, bind it to data, and then optimize and execute it as a single compiled unit. Symbol was the substrate that made MXNet competitive on memory and throughput, because the executor could fuse operators, allocate buffers in advance, and exploit sublinear memory tricks during backpropagation.

A distinctive feature was that NDArray and Symbol shared the same operator registry and execution backend. A model built imperatively with NDArrays could be "hybridized" into a symbolic graph for deployment, an idea that later influenced TorchScript and `torch.compile` in [PyTorch](/wiki/pytorch).

### Language bindings

MXNet aimed for unusually broad language coverage. Official front ends existed for:

| Language | Notes |
| --- | --- |
| Python | Primary interface, included Gluon |
| C++ | Native API and the `cpp-package` higher-level wrapper |
| Scala | Used in JVM-based production systems |
| Java | Wrapper over the Scala API |
| R | Maintained for statisticians and academic users |
| Julia | `MXNet.jl` package |
| Perl | `AI::MXNet` on CPAN |
| JavaScript | Browser inference via `mxnet.js` |
| Clojure | `org.apache.clojure-mxnet` |
| Go | Community-maintained Go bindings |

The sheer count of bindings reflected the DMLC group's research-machine-learning roots, but it also fragmented documentation effort, and few of these bindings reached the maturity of the Python interface.

### Distributed training

Distributed training was built on a key-value store abstraction called **KVStore**, which combined a parameter server architecture with synchronous and asynchronous gradient updates. KVStore could shard parameters across multiple servers and supported `local`, `device`, `dist_sync`, and `dist_async` modes. This design drew directly on Mu Li's earlier OSDI 2014 paper on parameter servers, and it allowed MXNet to scale to clusters with hundreds of GPUs.[10]

### Memory optimization

MXNet was known for aggressive memory optimization. The symbolic executor performed in-place operation fusion when safe, and it implemented a sublinear memory technique for training deep networks (described by Chen, Xu, Zhang, and Guestrin in arXiv:1604.06174, April 2016) that traded extra recomputation during the backward pass for reduced peak memory.[2] This made it possible to train networks with hundreds of layers on commodity GPUs.

### Automatic differentiation

MXNet supported automatic differentiation through `mx.autograd` for the imperative path and through the symbolic graph compiler for the declarative path. Both shared the same set of registered backward functions for operators, so a user could write training code in NumPy-like style and still benefit from optimized gradient computation.

## What was the Gluon API?

Gluon was a high-level, imperative deep learning API for MXNet, announced jointly by AWS and Microsoft on October 12, 2017 and shipped first in Apache MXNet version 0.11.[6] It resembled PyTorch in feel: users built neural networks using object-oriented Python with eager execution by default, then called `hybridize()` to compile the model into a static graph for deployment. The API became the recommended entry point for new MXNet users from late 2017 onward.

Gluon was positioned as an accessibility play. Swami Sivasubramanian, then VP of Amazon AI, framed the goal in the launch announcement: "The potential of machine learning can only be realized if it is accessible to all developers."[6] AWS and Microsoft also published Gluon's reference specification so that other engines could implement the interface, with support for Microsoft's Cognitive Toolkit (CNTK) promised in a future release.[6]

A family of toolkits built on Gluon followed:

| Toolkit | Domain | First release |
| --- | --- | --- |
| GluonCV | Computer vision (classification, detection, segmentation, pose) | 2018 |
| GluonNLP | Natural language processing (BERT, transformers, embeddings) | 2018 |
| Gluon-TS | Probabilistic time-series forecasting | 2019 |
| GluonFR | Face recognition | 2019 |

[GluonCV](/wiki/gluoncv) shipped pretrained ResNet, MobileNet, YOLOv3, Mask R-CNN, and SimplePose models that were widely used in research benchmarks. [Gluon-TS](/wiki/gluon_ts), developed at Amazon, included implementations of DeepAR, DeepState, and Transformer forecasters, and it later survived MXNet's retirement by adding a [PyTorch](/wiki/pytorch) backend.

## How did MXNet join the Apache Software Foundation?

On January 23, 2017, MXNet entered the Apache Incubator.[13] The proposal listed AWS as the primary sponsor, with Carnegie Mellon University, the University of Washington, NYU Shanghai, Intel, and other groups as supporting champions. The codebase moved from `dmlc/mxnet` to `apache/incubator-mxnet`, governance shifted to an Apache-style Project Management Committee, and contributions began to flow under an Apache 2.0 Individual Contributor License Agreement.

During incubation the project added a formal release process, voting procedures for binary artifacts, and a podling community that included contributors from Amazon, Microsoft, Intel, NVIDIA, Samsung, and a long tail of academic institutions.

On September 21, 2022, the Apache Board approved the resolution establishing Apache MXNet as a Top-Level Project, graduating it from the Incubator after more than five years of podling status.[13] The repository moved to `apache/mxnet`, and the project dropped the "incubating" qualifier. By that point, however, momentum had already shifted toward PyTorch and TensorFlow, and graduation came only about a year before the project was retired.

## Which companies and products used MXNet?

Amazon was the most visible MXNet user. In November 2016, Werner Vogels published a blog post titled "MXNet: Deep Learning Framework of Choice at AWS," announcing internal adoption and laying out a multi-year investment plan.[5] AWS shipped MXNet as a default in:

- The **AWS Deep Learning AMIs** (Amazon Machine Images for EC2), where it sat alongside TensorFlow, PyTorch, and Caffe.
- **Amazon SageMaker**, where MXNet was the first framework with a managed estimator class when the service launched in November 2017.
- **AWS Greengrass ML Inference** for edge devices.
- **Amazon Elastic Inference**, where MXNet was supported alongside TensorFlow.

Microsoft co-announced [Gluon](/wiki/gluon) in October 2017 and contributed integration work to bring MXNet into Azure Machine Learning.[6] NVIDIA maintained CUDA and cuDNN integration and shipped MXNet binaries inside the NGC container catalog. Intel contributed the **MKL-DNN** (later renamed [oneDNN](/wiki/onednn)) backend, which accelerated MXNet on Xeon and Xeon Phi CPUs.

Other organizations that adopted or integrated MXNet included Wolfram Research (the Mathematica neural network framework used MXNet as a backend), Cisco, Apple, Indeed, and the Vector Institute. The TuSimple autonomous trucking team published several models trained in MXNet during 2017 and 2018.

## What hardware did MXNet support?

MXNet ran on a wider range of hardware than most contemporary frameworks. The maintained backends included:

| Hardware | Library | Notes |
| --- | --- | --- |
| NVIDIA GPUs | [CUDA](/wiki/cuda), cuDNN, NCCL | Primary GPU target |
| AMD GPUs | [ROCm](/wiki/rocm), MIOpen | Community port, partial coverage |
| Intel CPUs | MKL-DNN / [oneDNN](/wiki/onednn) | Default for x86 inference |
| ARM CPUs | NNPACK, ARM Compute Library | Mobile and edge targets |
| Apple Silicon | Accelerate framework (limited) | Pre-M1 support |
| Custom accelerators | TVM, Treelite | Through compilation rather than direct kernels |

The ARM and edge support, combined with the small size of MXNet's runtime, made it an attractive option for on-device inference between roughly 2017 and 2020. AWS used MXNet inside [TVM](/wiki/tvm) compilation pipelines for inference at the edge through DeepLens and SageMaker Neo.

## Notable models and projects

A number of well-known model implementations were released first or primarily in MXNet:

- **GluonCV pretrained models**, including ResNet-50, ResNet-152, MobileNetV2, YOLOv3, and Faster R-CNN, were widely used as baselines for transfer learning.
- **DeepAR**, a probabilistic recurrent forecaster developed at Amazon and described in Salinas et al. (2017), shipped as part of [Gluon-TS](/wiki/gluon_ts) and was used in production for AWS Forecast.[9]
- **Sockeye**, a sequence-to-sequence framework for neural machine translation maintained by AWS, used MXNet as its compute backend through version 2.
- **MXBoard**, a TensorBoard-compatible logging library for MXNet, allowed users to visualize scalars, images, embeddings, and graphs.
- **insightface**, the open-source face recognition library by Jia Guo and Jiankang Deng, started life as an MXNet codebase and only later added a PyTorch port.
- **MXNet implementations of BERT and ALBERT** in GluonNLP, contributed by Amazon and academic collaborators in 2018 and 2019.

## Why was MXNet retired?

MXNet was retired because its contributor base shrank to the point where the project could no longer assemble the quorum needed to ship releases, while PyTorch and TensorFlow captured the research and production markets it had once competed for. From 2018 onward, [PyTorch](/wiki/pytorch) absorbed most academic users, and [TensorFlow](/wiki/tensorflow) 2.x with Keras absorbed most production users who wanted Google-cloud integration. MXNet's user base, anchored at AWS, did not grow at the same rate.

Several factors contributed to the decline:

- **Documentation drift**, where Gluon, Module, and the older Symbol API each had separate tutorials and the recommended path changed over time.
- **Smaller research community**, since most academic papers in the late 2010s shipped with PyTorch reference code.
- **Slower release cadence** after 2020, with maintainers from AWS reassigning to PyTorch as that framework gained Amazon support through SageMaker.
- **Departure of core contributors**, several of whom moved to companies that built around different stacks. [Tianqi Chen](/wiki/tianqi_chen) had already shifted his attention to [TVM](/wiki/tvm) and [XGBoost](/wiki/xgboost), and Mu Li and other founders moved to projects such as AutoGluon and D2L (the Dive into Deep Learning textbook).

The last stable release, version **1.9.1**, was tagged in May 2022. After that, commit activity in `apache/mxnet` slowed to a trickle, and several attempts to start a 2.x release line stalled in the release-vote stage.

On September 26, 2023, the Apache Software Foundation announced that MXNet had been retired and moved to the [Apache Attic](/wiki/apache_attic), the foundation's repository for inactive projects.[4] The retirement was driven by insufficient maintainer activity and the inability to assemble the quorum needed to cut new releases. The Apache Attic, created in November 2008, exists to provide oversight and a clear end-of-life process once an Apache project can no longer be actively maintained. The codebase remains read-only at `github.com/apache/mxnet`, and the `mxnet.apache.org` documentation is now hosted under the Apache Attic site; the move to the Attic was completed in early 2024.[4][13]

## How did MXNet compare with PyTorch and TensorFlow?

| Feature | Apache MXNet | PyTorch | TensorFlow |
| --- | --- | --- | --- |
| Initial release | 2015 | September 2016 | November 2015 |
| Primary sponsor | Amazon Web Services | Meta AI (formerly FAIR) | Google |
| Default execution | Imperative (Gluon) and symbolic (Symbol) | Eager, with `torch.compile` since 2.0 | Eager (2.x), graph (1.x) |
| Multi-language bindings | Python, C++, R, Scala, Julia, Perl, Java, Clojure, JavaScript, Go | Python, C++ (limited), Java (deprecated), Mobile | Python, C++, JavaScript (TF.js), Java, Go, Swift (retired) |
| Distributed training | KVStore parameter server | DDP, FSDP, RPC | tf.distribute, MultiWorkerMirrored |
| Mobile deployment | MXNet model server, ONNX export | PyTorch Mobile, ExecuTorch | TensorFlow Lite |
| Status (2026) | Retired (Apache Attic, Sept 2023) | Active (Linux Foundation since 2022) | Active (Google) |
| License | Apache 2.0 | Modified BSD | Apache 2.0 |

The core lesson of the comparison is convergence. MXNet bet early on supporting both eager and graph execution in one framework, and by 2023 both surviving rivals had adopted the same hybrid stance (PyTorch through `torch.compile`, TensorFlow through its 2.x eager-by-default model). MXNet pioneered the design but lost the ecosystem.

## What is MXNet's legacy?

Although the framework itself was retired, several MXNet-related projects continued under different stacks:

- [Gluon-TS](/wiki/gluon_ts) added a PyTorch backend in 2021 and remains a widely used time-series forecasting library at AWS.
- [AutoGluon](/wiki/autogluon), the AutoML toolkit started by Mu Li and colleagues at Amazon in 2019, dropped its MXNet dependency in versions after 0.7 and now runs on PyTorch and scikit-learn.
- The **D2L textbook** ("Dive into Deep Learning"), written by Aston Zhang, Zachary Lipton, Mu Li, and Alex Smola, originally used MXNet for code examples and later added PyTorch, TensorFlow, and JAX versions.
- [TVM](/wiki/tvm), originally created by Tianqi Chen as a compilation layer for MXNet, became an Apache Top-Level Project in its own right and now serves models from PyTorch, TensorFlow, and ONNX.
- [XGBoost](/wiki/xgboost), also originated by Tianqi Chen and other DMLC members, predates MXNet and continues as one of the most widely used gradient boosting libraries.

Several DMLC alumni went on to found or join companies that shaped subsequent waves of AI infrastructure. Tianqi Chen co-founded OctoML (later OctoAI), which NVIDIA acquired in September 2024, and joined the faculty at Carnegie Mellon. Mu Li worked at Amazon on AutoGluon and later moved to Boson AI. Bing Xu and others moved to OpenAI, Anthropic, and other model labs.

The MXNet codebase is preserved in archived form at `github.com/apache/mxnet`, and the project's documentation remains accessible through the Apache Attic. Researchers studying the early symbolic-versus-eager debate in deep learning frameworks still cite the original 2015 paper as a clear early articulation of the hybrid approach that PyTorch and TensorFlow eventually converged on.[1]

## ELI5

MXNet was a toolbox for building and training neural networks. Its trick was that you could write code in two ways: a quick, line-by-line style that is easy to debug (like playing with LEGO bricks one at a time), and a blueprint style that the computer can compile and run very fast (like handing a finished LEGO instruction booklet to a robot). Amazon liked MXNet and used it for years, but most people ended up choosing PyTorch and TensorFlow instead. With almost no one left to fix bugs and ship updates, the project was officially shut down and frozen in 2023.

## See also

- [PyTorch](/wiki/pytorch)
- [TensorFlow](/wiki/tensorflow)
- [TVM](/wiki/tvm)
- [XGBoost](/wiki/xgboost)
- [AutoGluon](/wiki/autogluon)
- [Amazon Web Services](/wiki/aws)
- [deep learning](/wiki/deep_learning)

## References

1. Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., and Zhang, Z. (2015). "MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems." arXiv:1512.01274.
2. Chen, T., Xu, B., Zhang, C., and Guestrin, C. (2016). "Training Deep Nets with Sublinear Memory Cost." arXiv:1604.06174.
3. Apache Software Foundation. "Apache MXNet" project information page, projects.apache.org.
4. Apache Software Foundation. "Apache MXNet (incubating) Joins the Apache Attic." September 26, 2023; attic.apache.org/projects/mxnet.html.
5. Vogels, W. "MXNet: Deep Learning Framework of Choice at AWS." All Things Distributed (blog), November 22, 2016.
6. AWS and Microsoft. "AWS and Microsoft Announce Gluon, Making Deep Learning Accessible to All Developers." October 12, 2017; Smola, A. and Sivasubramanian, S. "Introducing Gluon: A New Library for Machine Learning from AWS and Microsoft." AWS Machine Learning Blog, October 12, 2017.
7. Apache MXNet documentation, mxnet.apache.org (now archived under Apache Attic).
8. Apache MXNet source repository, github.com/apache/mxnet (archived).
9. Salinas, D., Flunkert, V., and Gasthaus, J. (2017). "DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks." arXiv:1704.04110.
10. Li, M., Andersen, D. G., Park, J. W., Smola, A. J., Ahmed, A., Josifovski, V., Long, J., Shekita, E. J., and Su, B. (2014). "Scaling Distributed Machine Learning with the Parameter Server." OSDI 2014.
11. Sockeye release notes and changelog, github.com/awslabs/sockeye.
12. GluonCV release notes, gluon-cv.mxnet.io.
13. Apache Incubator. "MXNet Incubation Status" (entered incubation 2017-01-23; Board approved graduation 2022-09-21), incubator.apache.org/projects/mxnet.html.
14. Gluon-TS documentation, ts.gluon.ai.

