Apache MXNet

AI Infrastructure Artificial Intelligence Developer Tools Open Source AI

17 min read

Updated Jun 25, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 25, 2026

Fact-checked

In review queue

Sources

14 citations

Revision

v2 · 3,371 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Apache MXNet (pronounced "mix-net") was an open-source deep learning framework that combined imperative and symbolic execution in one runtime, created around 2015 by the DMLC (Distributed Machine Learning Community) group and adopted by Amazon Web Services as its deep learning framework of choice. It is now retired: the Apache Software Foundation moved MXNet to the Apache Attic in September 2023 after maintainer activity collapsed, leaving the project read-only. MXNet is best remembered for its hybrid programming model, its high-level Gluon API (co-developed with Microsoft in 2017), and for being the framework that PyTorch and TensorFlow ultimately displaced.

MXNet was originally developed in 2015 by researchers in the DMLC group at CMU, NYU, MIT, and the University of Washington. The project was donated to the Apache Software Foundation in January 2017 (entering the Apache Incubator), graduated to an Apache Top-Level Project on September 21, 2022, and was retired to the Apache Attic on September 26, 2023.^[1]^[4]^[13]

From roughly 2016 through 2022, MXNet was the deep learning framework most actively promoted by Amazon Web Services. Amazon CTO Werner Vogels announced on November 22, 2016 that "MXNet will be our deep learning framework of choice."^[5] The framework was used internally at Amazon for product recommendations, search, and computer vision workloads, and shipped as a default option in Amazon SageMaker and the AWS Deep Learning AMIs. Despite that backing, PyTorch and TensorFlow consolidated the research and production market over the late 2010s, and contributor activity to MXNet fell sharply after 2021.

What is Apache MXNet?

Apache MXNet was a deep learning framework designed for both flexibility and efficiency on heterogeneous, distributed hardware. Its defining idea was that the same library exposed two programming models that shared one execution backend: an imperative, NumPy-like interface (NDArray) for eager experimentation, and a symbolic graph interface (Symbol) for optimized, compiled deployment. A model written imperatively could be "hybridized" into a static graph, an approach later echoed by TorchScript and torch.compile in PyTorch.

The name "MXNet" stood for "mix net," a reference to the mix of imperative (mutable) and declarative (symbolic) programming styles the system supported. The original 2015 paper described MXNet as "a multi-language machine learning (ML) library to ease the development of ML algorithms, especially for deep neural networks," emphasizing portability across CPUs, GPUs, and clusters.^[1]

Facts

Field	Detail
Original authors	Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, Zheng Zhang
Developer	DMLC group; Apache Software Foundation (from 2017)
Initial public release	2015 (first public beta)
First paper	December 3, 2015 (arXiv:1512.01274)
Apache Incubator entry	January 23, 2017
Apache Top-Level Project	September 21, 2022
Final stable release	1.9.1 (May 2022)
Retirement	September 26, 2023 (moved to Apache Attic)
License	Apache License 2.0
Written in	C++, Python, CUDA
Repository	github.com/apache/mxnet (archived)
Website	mxnet.apache.org (archived)

When was MXNet created, and who built it?

MXNet grew out of the work of the DMLC collective, an informal group of graduate students and researchers spread across several universities who collaborated on machine learning systems starting around 2014. The members included Tianqi Chen (then at the University of Washington), Mu Li (Carnegie Mellon University), Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang (then at NYU Shanghai).

Before MXNet, several DMLC members had been working on separate deep learning systems with overlapping aims:

CXXNet, a C++ deep learning library written largely by Bing Xu and Tianqi Chen, focused on configuration-driven training of convolutional networks.
Minerva, a system from NYU led by Zheng Zhang and Minjie Wang, that explored dynamic dataflow execution and lazy evaluation of N-dimensional arrays.
Purine, an early deep learning framework that experimented with bipartite computation graphs.

In 2015 the authors agreed to merge the lessons from these projects into a single library that combined a Python-style imperative NDArray API with a symbolic graph compiler. The first commit to the public dmlc/mxnet GitHub repository was made in mid-2015, and a beta release appeared in September of that year.

The original technical paper, "MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems," was posted to arXiv on December 3, 2015 (arXiv:1512.01274).^[1] The paper described MXNet's mixed programming interface, its dependency engine that scheduled operations across CPU and GPU devices, and its parameter server design for distributed training. It listed Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang as co-authors.

How did MXNet work?

MXNet's design centered on two complementary programming models that shared the same underlying runtime.

NDArray and Symbol

The NDArray API was an imperative interface similar to NumPy, with operations dispatched eagerly across CPU or GPU devices. NDArrays could be allocated on specific contexts (mx.cpu() or mx.gpu(i)), and arithmetic was executed asynchronously by an internal dependency engine.

The Symbol API, in contrast, let users construct a symbolic computation graph, bind it to data, and then optimize and execute it as a single compiled unit. Symbol was the substrate that made MXNet competitive on memory and throughput, because the executor could fuse operators, allocate buffers in advance, and exploit sublinear memory tricks during backpropagation.

A distinctive feature was that NDArray and Symbol shared the same operator registry and execution backend. A model built imperatively with NDArrays could be "hybridized" into a symbolic graph for deployment, an idea that later influenced TorchScript and torch.compile in PyTorch.

Language bindings

MXNet aimed for unusually broad language coverage. Official front ends existed for:

Language	Notes
Python	Primary interface, included Gluon
C++	Native API and the `cpp-package` higher-level wrapper
Scala	Used in JVM-based production systems
Java	Wrapper over the Scala API
R	Maintained for statisticians and academic users
Julia	`MXNet.jl` package
Perl	`AI::MXNet` on CPAN
JavaScript	Browser inference via `mxnet.js`
Clojure	`org.apache.clojure-mxnet`
Go	Community-maintained Go bindings

The sheer count of bindings reflected the DMLC group's research-machine-learning roots, but it also fragmented documentation effort, and few of these bindings reached the maturity of the Python interface.

Distributed training

Distributed training was built on a key-value store abstraction called KVStore, which combined a parameter server architecture with synchronous and asynchronous gradient updates. KVStore could shard parameters across multiple servers and supported local, device, dist_sync, and dist_async modes. This design drew directly on Mu Li's earlier OSDI 2014 paper on parameter servers, and it allowed MXNet to scale to clusters with hundreds of GPUs.^[10]

Memory optimization

MXNet was known for aggressive memory optimization. The symbolic executor performed in-place operation fusion when safe, and it implemented a sublinear memory technique for training deep networks (described by Chen, Xu, Zhang, and Guestrin in arXiv:1604.06174, April 2016) that traded extra recomputation during the backward pass for reduced peak memory.^[2] This made it possible to train networks with hundreds of layers on commodity GPUs.

Automatic differentiation

MXNet supported automatic differentiation through mx.autograd for the imperative path and through the symbolic graph compiler for the declarative path. Both shared the same set of registered backward functions for operators, so a user could write training code in NumPy-like style and still benefit from optimized gradient computation.

What was the Gluon API?

Gluon was a high-level, imperative deep learning API for MXNet, announced jointly by AWS and Microsoft on October 12, 2017 and shipped first in Apache MXNet version 0.11.^[6] It resembled PyTorch in feel: users built neural networks using object-oriented Python with eager execution by default, then called hybridize() to compile the model into a static graph for deployment. The API became the recommended entry point for new MXNet users from late 2017 onward.

Gluon was positioned as an accessibility play. Swami Sivasubramanian, then VP of Amazon AI, framed the goal in the launch announcement: "The potential of machine learning can only be realized if it is accessible to all developers."^[6] AWS and Microsoft also published Gluon's reference specification so that other engines could implement the interface, with support for Microsoft's Cognitive Toolkit (CNTK) promised in a future release.^[6]

A family of toolkits built on Gluon followed:

Toolkit	Domain	First release
GluonCV	Computer vision (classification, detection, segmentation, pose)	2018
GluonNLP	Natural language processing (BERT, transformers, embeddings)	2018
Gluon-TS	Probabilistic time-series forecasting	2019
GluonFR	Face recognition	2019

GluonCV shipped pretrained ResNet, MobileNet, YOLOv3, Mask R-CNN, and SimplePose models that were widely used in research benchmarks. Gluon-TS, developed at Amazon, included implementations of DeepAR, DeepState, and Transformer forecasters, and it later survived MXNet's retirement by adding a PyTorch backend.

How did MXNet join the Apache Software Foundation?

On January 23, 2017, MXNet entered the Apache Incubator.^[13] The proposal listed AWS as the primary sponsor, with Carnegie Mellon University, the University of Washington, NYU Shanghai, Intel, and other groups as supporting champions. The codebase moved from dmlc/mxnet to apache/incubator-mxnet, governance shifted to an Apache-style Project Management Committee, and contributions began to flow under an Apache 2.0 Individual Contributor License Agreement.

During incubation the project added a formal release process, voting procedures for binary artifacts, and a podling community that included contributors from Amazon, Microsoft, Intel, NVIDIA, Samsung, and a long tail of academic institutions.

On September 21, 2022, the Apache Board approved the resolution establishing Apache MXNet as a Top-Level Project, graduating it from the Incubator after more than five years of podling status.^[13] The repository moved to apache/mxnet, and the project dropped the "incubating" qualifier. By that point, however, momentum had already shifted toward PyTorch and TensorFlow, and graduation came only about a year before the project was retired.

Which companies and products used MXNet?

Amazon was the most visible MXNet user. In November 2016, Werner Vogels published a blog post titled "MXNet: Deep Learning Framework of Choice at AWS," announcing internal adoption and laying out a multi-year investment plan.^[5] AWS shipped MXNet as a default in:

The AWS Deep Learning AMIs (Amazon Machine Images for EC2), where it sat alongside TensorFlow, PyTorch, and Caffe.
Amazon SageMaker, where MXNet was the first framework with a managed estimator class when the service launched in November 2017.
AWS Greengrass ML Inference for edge devices.
Amazon Elastic Inference, where MXNet was supported alongside TensorFlow.

Microsoft co-announced Gluon in October 2017 and contributed integration work to bring MXNet into Azure Machine Learning.^[6] NVIDIA maintained CUDA and cuDNN integration and shipped MXNet binaries inside the NGC container catalog. Intel contributed the MKL-DNN (later renamed oneDNN) backend, which accelerated MXNet on Xeon and Xeon Phi CPUs.

Other organizations that adopted or integrated MXNet included Wolfram Research (the Mathematica neural network framework used MXNet as a backend), Cisco, Apple, Indeed, and the Vector Institute. The TuSimple autonomous trucking team published several models trained in MXNet during 2017 and 2018.

What hardware did MXNet support?

MXNet ran on a wider range of hardware than most contemporary frameworks. The maintained backends included:

Hardware	Library	Notes
NVIDIA GPUs	CUDA, cuDNN, NCCL	Primary GPU target
AMD GPUs	ROCm, MIOpen	Community port, partial coverage
Intel CPUs	MKL-DNN / oneDNN	Default for x86 inference
ARM CPUs	NNPACK, ARM Compute Library	Mobile and edge targets
Apple Silicon	Accelerate framework (limited)	Pre-M1 support
Custom accelerators	TVM, Treelite	Through compilation rather than direct kernels

The ARM and edge support, combined with the small size of MXNet's runtime, made it an attractive option for on-device inference between roughly 2017 and 2020. AWS used MXNet inside TVM compilation pipelines for inference at the edge through DeepLens and SageMaker Neo.

Notable models and projects

A number of well-known model implementations were released first or primarily in MXNet:

GluonCV pretrained models, including ResNet-50, ResNet-152, MobileNetV2, YOLOv3, and Faster R-CNN, were widely used as baselines for transfer learning.
DeepAR, a probabilistic recurrent forecaster developed at Amazon and described in Salinas et al. (2017), shipped as part of Gluon-TS and was used in production for AWS Forecast.^[9]
Sockeye, a sequence-to-sequence framework for neural machine translation maintained by AWS, used MXNet as its compute backend through version 2.
MXBoard, a TensorBoard-compatible logging library for MXNet, allowed users to visualize scalars, images, embeddings, and graphs.
insightface, the open-source face recognition library by Jia Guo and Jiankang Deng, started life as an MXNet codebase and only later added a PyTorch port.
MXNet implementations of BERT and ALBERT in GluonNLP, contributed by Amazon and academic collaborators in 2018 and 2019.

Why was MXNet retired?

MXNet was retired because its contributor base shrank to the point where the project could no longer assemble the quorum needed to ship releases, while PyTorch and TensorFlow captured the research and production markets it had once competed for. From 2018 onward, PyTorch absorbed most academic users, and TensorFlow 2.x with Keras absorbed most production users who wanted Google-cloud integration. MXNet's user base, anchored at AWS, did not grow at the same rate.

Several factors contributed to the decline:

Documentation drift, where Gluon, Module, and the older Symbol API each had separate tutorials and the recommended path changed over time.
Smaller research community, since most academic papers in the late 2010s shipped with PyTorch reference code.
Slower release cadence after 2020, with maintainers from AWS reassigning to PyTorch as that framework gained Amazon support through SageMaker.
Departure of core contributors, several of whom moved to companies that built around different stacks. Tianqi Chen had already shifted his attention to TVM and XGBoost, and Mu Li and other founders moved to projects such as AutoGluon and D2L (the Dive into Deep Learning textbook).

The last stable release, version 1.9.1, was tagged in May 2022. After that, commit activity in apache/mxnet slowed to a trickle, and several attempts to start a 2.x release line stalled in the release-vote stage.

On September 26, 2023, the Apache Software Foundation announced that MXNet had been retired and moved to the Apache Attic, the foundation's repository for inactive projects.^[4] The retirement was driven by insufficient maintainer activity and the inability to assemble the quorum needed to cut new releases. The Apache Attic, created in November 2008, exists to provide oversight and a clear end-of-life process once an Apache project can no longer be actively maintained. The codebase remains read-only at github.com/apache/mxnet, and the mxnet.apache.org documentation is now hosted under the Apache Attic site; the move to the Attic was completed in early 2024.^[4]^[13]

How did MXNet compare with PyTorch and TensorFlow?

Feature	Apache MXNet	PyTorch	TensorFlow
Initial release	2015	September 2016	November 2015
Primary sponsor	Amazon Web Services	Meta AI (formerly FAIR)	Google
Default execution	Imperative (Gluon) and symbolic (Symbol)	Eager, with `torch.compile` since 2.0	Eager (2.x), graph (1.x)
Multi-language bindings	Python, C++, R, Scala, Julia, Perl, Java, Clojure, JavaScript, Go	Python, C++ (limited), Java (deprecated), Mobile	Python, C++, JavaScript (TF.js), Java, Go, Swift (retired)
Distributed training	KVStore parameter server	DDP, FSDP, RPC	tf.distribute, MultiWorkerMirrored
Mobile deployment	MXNet model server, ONNX export	PyTorch Mobile, ExecuTorch	TensorFlow Lite
Status (2026)	Retired (Apache Attic, Sept 2023)	Active (Linux Foundation since 2022)	Active (Google)
License	Apache 2.0	Modified BSD	Apache 2.0

The core lesson of the comparison is convergence. MXNet bet early on supporting both eager and graph execution in one framework, and by 2023 both surviving rivals had adopted the same hybrid stance (PyTorch through torch.compile, TensorFlow through its 2.x eager-by-default model). MXNet pioneered the design but lost the ecosystem.

What is MXNet's legacy?

Although the framework itself was retired, several MXNet-related projects continued under different stacks:

Gluon-TS added a PyTorch backend in 2021 and remains a widely used time-series forecasting library at AWS.
AutoGluon, the AutoML toolkit started by Mu Li and colleagues at Amazon in 2019, dropped its MXNet dependency in versions after 0.7 and now runs on PyTorch and scikit-learn.
The D2L textbook ("Dive into Deep Learning"), written by Aston Zhang, Zachary Lipton, Mu Li, and Alex Smola, originally used MXNet for code examples and later added PyTorch, TensorFlow, and JAX versions.
TVM, originally created by Tianqi Chen as a compilation layer for MXNet, became an Apache Top-Level Project in its own right and now serves models from PyTorch, TensorFlow, and ONNX.
XGBoost, also originated by Tianqi Chen and other DMLC members, predates MXNet and continues as one of the most widely used gradient boosting libraries.

Several DMLC alumni went on to found or join companies that shaped subsequent waves of AI infrastructure. Tianqi Chen co-founded OctoML (later OctoAI), which NVIDIA acquired in September 2024, and joined the faculty at Carnegie Mellon. Mu Li worked at Amazon on AutoGluon and later moved to Boson AI. Bing Xu and others moved to OpenAI, Anthropic, and other model labs.

The MXNet codebase is preserved in archived form at github.com/apache/mxnet, and the project's documentation remains accessible through the Apache Attic. Researchers studying the early symbolic-versus-eager debate in deep learning frameworks still cite the original 2015 paper as a clear early articulation of the hybrid approach that PyTorch and TensorFlow eventually converged on.^[1]

ELI5

MXNet was a toolbox for building and training neural networks. Its trick was that you could write code in two ways: a quick, line-by-line style that is easy to debug (like playing with LEGO bricks one at a time), and a blueprint style that the computer can compile and run very fast (like handing a finished LEGO instruction booklet to a robot). Amazon liked MXNet and used it for years, but most people ended up choosing PyTorch and TensorFlow instead. With almost no one left to fix bugs and ship updates, the project was officially shut down and frozen in 2023.

References

Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., and Zhang, Z. (2015). "MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems." arXiv:1512.01274. ↩
Chen, T., Xu, B., Zhang, C., and Guestrin, C. (2016). "Training Deep Nets with Sublinear Memory Cost." arXiv:1604.06174. ↩
Apache Software Foundation. "Apache MXNet" project information page, projects.apache.org.
Apache Software Foundation. "Apache MXNet (incubating) Joins the Apache Attic." September 26, 2023; attic.apache.org/projects/mxnet.html. ↩
Vogels, W. "MXNet: Deep Learning Framework of Choice at AWS." All Things Distributed (blog), November 22, 2016. ↩
AWS and Microsoft. "AWS and Microsoft Announce Gluon, Making Deep Learning Accessible to All Developers." October 12, 2017; Smola, A. and Sivasubramanian, S. "Introducing Gluon: A New Library for Machine Learning from AWS and Microsoft." AWS Machine Learning Blog, October 12, 2017. ↩
Apache MXNet documentation, mxnet.apache.org (now archived under Apache Attic).
Apache MXNet source repository, github.com/apache/mxnet (archived).
Salinas, D., Flunkert, V., and Gasthaus, J. (2017). "DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks." arXiv:1704.04110. ↩
Li, M., Andersen, D. G., Park, J. W., Smola, A. J., Ahmed, A., Josifovski, V., Long, J., Shekita, E. J., and Su, B. (2014). "Scaling Distributed Machine Learning with the Parameter Server." OSDI 2014. ↩
Sockeye release notes and changelog, github.com/awslabs/sockeye.
GluonCV release notes, gluon-cv.mxnet.io.
Apache Incubator. "MXNet Incubation Status" (entered incubation 2017-01-23; Board approved graduation 2022-09-21), incubator.apache.org/projects/mxnet.html. ↩
Gluon-TS documentation, ts.gluon.ai.

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Amazon SageMaker Horovod Kubeflow Node (TensorFlow graph)Parameter Server (PS)

What is Apache MXNet?

Facts

When was MXNet created, and who built it?

How did MXNet work?

NDArray and Symbol

Language bindings

Distributed training

Memory optimization

Automatic differentiation

What was the Gluon API?

How did MXNet join the Apache Software Foundation?

Which companies and products used MXNet?

What hardware did MXNet support?

Notable models and projects

Why was MXNet retired?

How did MXNet compare with PyTorch and TensorFlow?

What is MXNet's legacy?

ELI5

See also

References

Improve this article

Related Articles

Amazon Q

Ray (framework)

XLA (Accelerated Linear Algebra)

Supabase

Horovod

LanceDB

What links here

Related Articles

Amazon Q

Ray (framework)

XLA (Accelerated Linear Algebra)

Supabase

Horovod

LanceDB

What links here