# MLX

> Source: https://aiwiki.ai/wiki/mlx
> Updated: 2026-06-23
> Categories: AI Companies, AI Hardware, Developer Tools, Open Source AI
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**MLX** is an open-source array and machine-learning framework built by [Apple Inc.](/wiki/apple_inc)'s machine-learning research team specifically for [Apple silicon](/wiki/apple_silicon), the M1, M2, M3, M4, and M5 series of system-on-chip processors. First released on GitHub on December 5, 2023 under the permissive MIT license, MLX pairs a [NumPy](/wiki/numpy)-like Python API for arrays with a [PyTorch](/wiki/pytorch)-like API for neural networks, and it exploits the unified-memory architecture of Apple silicon so that arrays are shared between CPU and GPU without explicit data copies.[1][20] The framework's official description is that "MLX is an array framework for machine learning on Apple silicon, brought to you by Apple machine learning research."[1] By June 2026 the project had passed 27,000 stars on GitHub, making it Apple's second most starred open-source project after the Swift programming language.[1]

MLX was conceived as a research-friendly framework that closes the gap between CPU and GPU programming on the Mac. On most other platforms, an ML developer has to think about which device a tensor lives on and explicitly move it across an interconnect such as PCIe. On Apple silicon, CPU and GPU share the same physical memory, so the framework can avoid those copies entirely. MLX bakes that assumption into its core, which is what makes it noticeably different from cross-platform frameworks like PyTorch with the Metal Performance Shaders backend or [JAX](/wiki/jax) with its experimental Metal backend.[1]

Since the initial release, MLX has grown into a small ecosystem of related projects: the core `mlx` array library, the `mlx-lm` package for large language model inference and fine-tuning, the `mlx-data` library for data loading, the `mlx-swift` package that exposes the API to Swift and iOS, and the `mlx-examples` repository that hosts reference implementations of models such as Llama, Mistral, Mixtral, Stable Diffusion, Whisper, and many others.[2][3] By 2026 the MLX Community organization on Hugging Face hosted thousands of pre-converted model weights.[11]

## When was MLX released?

MLX appeared during a period when running large language models locally on a Mac was becoming popular but was mostly limited to projects like [llama.cpp](/wiki/llama_cpp) that targeted the CPU plus a thin Metal layer. Apple's ML research division wanted a framework that researchers inside and outside the company could actually train and prototype with on consumer hardware, not just deploy to. The result was MLX, released on December 5, 2023 alongside the `mlx-examples` repository so that the framework would launch with working reference code rather than as a bare runtime.[2][20]

| Date | Event |
|------|-------|
| December 5, 2023 | Initial open-source release of MLX and mlx-examples on GitHub by Apple's machine-learning research team; announced by Awni Hannun on X[7] |
| Late December 2023 | Quick wave of community ports: Llama 2, Mistral 7B, Stable Diffusion, and Whisper inference examples appear in mlx-examples |
| January 2024 | Optimized inference paths for Llama 2, Mistral, and Qwen released; LoRA fine-tuning example added |
| February 20, 2024 | Release of MLX Swift, a Swift API to MLX with neural-network and optimizer packages plus Mistral 7B and MNIST examples[8] |
| 2024 (spring and summer) | Quantization support (4-bit and 8-bit), faster Metal kernels, and an `mx.compile` tracing compiler are added |
| May 2024 | Apple ships the M4 in the iPad Pro; MLX runs on M4 immediately because it targets the Apple silicon Metal stack rather than a specific chip generation |
| June 2024 | At WWDC 2024 Apple announces [Apple Intelligence](/wiki/apple_intelligence) and the Foundation Models framework; Apple highlights MLX as the recommended framework for fine-tuning and experimenting with open models on Mac[14] |
| Late 2024 | Distributed training and inference across multiple Macs over Thunderbolt or Ethernet land in MLX |
| 2024 to 2025 | The `mlx-lm` package matures into the standard CLI for running LLMs on Mac; thousands of pre-quantized models appear under the `mlx-community` organization on Hugging Face[11] |
| 2025 | Apple ships the M5 with dedicated neural accelerators in the GPU; MLX 0.2x and 0.3x releases add Metal 4 tensor-operation support and large speedups for time-to-first-token[6] |
| Mid-2025 | A CUDA back-end for MLX is introduced (`pip install "mlx[cuda]"`), letting code developed on Apple silicon also run on Nvidia GPUs |
| Late 2025 | JACCL distributed back-end uses RDMA over Thunderbolt 5 for low-latency multi-Mac clusters; demonstrations include trillion-parameter models running across four Mac Studios[15][16] |
| April 2026 | MLX 0.31.2 documented as the current release[4] |
| June 2026 | Project crosses about 27,000 stars on GitHub[1] |

## Who created MLX?

MLX was created by a small team inside Apple's machine-learning research division. The four engineers credited with equal contribution on the initial release are Awni Hannun, Jagrit Digani, Angelos Katharopoulos, and Ronan Collobert.[1]

Awni Hannun acted as the public face of the project for most of its first three years. Before joining Apple, he worked at Facebook AI Research and earned his PhD at Stanford University, where he co-authored the Deep Speech speech-recognition papers in Andrew Ng's group. He announced the original release on X on December 5, 2023, writing that "MLX is an efficient machine learning framework specifically designed for Apple silicon (i.e. your laptop!)" and continued posting roadmap updates, demos, and benchmarks for the next several years.[7] Hannun left Apple in February 2026 but, in his farewell post on February 27, 2026, named the people continuing to maintain MLX (Angelos Katharopoulos, Cheng Zhao, Jagrit Digani, and others) and said the project would keep moving.[17]

Ronan Collobert is a longtime researcher in deep learning who previously worked at Facebook AI Research and on the Torch and Flashlight frameworks. Angelos Katharopoulos is known for work on linear-attention transformers and contributes heavily to MLX's compiler and distributed back-ends. Jagrit Digani has focused on Metal kernels and operator coverage. Outside contributions come from a wide community of Mac-using ML developers, with the GitHub project showing thousands of merged pull requests across its first three years.[1]

## How does MLX work? Architecture and design philosophy

MLX is built around a small set of design decisions that, taken together, distinguish it from PyTorch, JAX, and TensorFlow. The most consequential is unified memory: there is no `.to("cuda")` and no separate host and device tensor, because on Apple silicon the CPU, GPU, and Neural Engine all address the same physical RAM. Operations specify the device they should run on, but the data does not move.[1]

| Design decision | Description |
|-----------------|-------------|
| Unified memory | Arrays live in shared memory and are visible to CPU and GPU without copies. Specifying a device chooses where the kernel runs, not where the data lives. |
| Lazy evaluation | Operations record nodes in a computation graph; nothing executes until a value is actually needed (for example via `mx.eval` or by printing or saving an array). This is similar in spirit to JAX. |
| Composable function transformations | `mlx.grad`, `mlx.value_and_grad`, `mlx.vmap`, `mlx.compile`, `mlx.jvp`, and `mlx.vjp` can be stacked, so a user can take the gradient of a vmapped, JIT-compiled function. |
| Familiar Python API | `mlx.core` mirrors NumPy in array creation, slicing, broadcasting, and reductions; `mlx.nn` mirrors the PyTorch `nn.Module` style of subclassing. |
| Multi-language bindings | Python, C++, C, and Swift APIs all wrap the same C++ core, with the C layer acting as a stable bridge. |
| Dynamic graphs with optional compilation | Code is eager by default; `mx.compile` traces and fuses operations to remove Python overhead and improve performance. |
| Native Metal kernels | Many operators ship hand-written Metal shaders rather than relying on a vendor BLAS, which lets MLX exploit details of the Apple GPU like SIMD groups and tile memory. |
| Multi-backend | CPU and Apple GPU are the primary back-ends; later releases added a CUDA back-end for Nvidia hardware and a JACCL distributed back-end. |

The official documentation summarizes the model in one line: "Computations in MLX are lazy. Arrays are only materialized when needed."[4] Lazy evaluation matters because it lets MLX fuse operators and skip work whose results are never read. It also makes function transformations cleaner, since the framework already has a graph to differentiate, vectorize, or compile, and MLX allows arbitrary compositions such as `grad(vmap(grad(fn)))`.[1][4] The trade-off is that the user has to remember to call `mx.eval` (or perform any operation that materializes a value) when they want a side effect like printing or timing.

The Python package `mlx.nn` defines familiar building blocks: `Linear`, `Embedding`, `Conv1d`, `Conv2d`, `Conv3d`, `LSTM`, `GRU`, `Transformer`, `MultiHeadAttention`, `RMSNorm`, `LayerNorm`, `RoPE`, and the usual activation functions. The `mlx.optimizers` package provides SGD, Adam, AdamW, Lion, Muon, and Adafactor, plus learning-rate schedules such as cosine decay, exponential decay, and linear schedules. Anyone who has used PyTorch can read MLX neural-network code without much trouble.[1]

## What are MLX's key features?

The headline features that come up most often in tutorials and benchmarks are listed below.

| Feature | What it does |
|---------|--------------|
| Lazy execution | Defers computation until a result is needed, enabling fusion and skipping unused work |
| Function transformations | `grad`, `value_and_grad`, `jit` (`compile`), `vmap`, and `jvp`/`vjp` for forward and reverse mode autodiff |
| Module-style neural networks | `mlx.nn.Module` subclasses with `__call__`, parameter pytrees, and easy state-dict style serialization |
| Optimizers | SGD, Adam, AdamW, Lion, Muon, Adafactor, with weight decay and gradient clipping helpers |
| Quantization | 4-bit and 8-bit weight quantization for inference, with `mx.quantize` and `mx.dequantize` primitives and a CLI in `mlx-lm` |
| LoRA and QLoRA fine-tuning | `mlx-lm.lora` trains low-rank adapters on top of a frozen base model; adapters can be fused back for deployment |
| Distributed training and inference | `mx.distributed` with `all_reduce`, `all_gather`, `send`, and `recv` over MPI, Ring, or the JACCL back-end on Thunderbolt 5 |
| Hugging Face integration | `mlx_lm.convert` downloads a Hugging Face model, quantizes it, and saves it locally or uploads to the `mlx-community` organization |
| Streaming generation and prompt caching | Token-by-token generation with reusable KV caches and rotating caches for long contexts |
| Multi-language bindings | Python, C++, C, and Swift; a Go community port also exists |

One practical detail that surprises new users is that MLX does not use the Apple Neural Engine ([Apple Neural Engine](/wiki/apple_neural_engine), or ANE) directly. The ANE is exposed through [Core ML](/wiki/core_ml), and MLX runs on the CPU and GPU instead. From the M5 onward, however, the GPU itself contains neural accelerators that provide matrix-multiplication primitives, and MLX uses those through Metal 4's Tensor Operations.[6]

## What is mlx-lm used for?

`mlx-lm` is the package most users actually install. It is a high-level Python library and command-line interface for running, quantizing, and fine-tuning large language models on Apple silicon. The default install pulls Llama-3.2-3B-Instruct in a 4-bit quantized form, so a user can type `pip install mlx-lm` followed by `mlx_lm.chat` and get a working chatbot in a few minutes on any modern Mac.[3]

The package supports thousands of open-weight models on the Hugging Face Hub. The most commonly used families include Llama 1, Llama 2, Llama 3, Llama 3.1, Llama 3.2, Llama 4, Mistral 7B, Mixtral 8x7B and 8x22B, Qwen 1.5, Qwen2, Qwen2.5, Qwen3, Phi-2 and Phi-3, Gemma and Gemma 2, DeepSeek (including DeepSeek R1 and V3 in distributed mode), Yi, Stable LM, and Falcon. Vision-language models are covered by the related community package `mlx-vlm`.[3][11]

The `mlx-lm` CLI exposes the main workflows directly:

- `mlx_lm.generate` for one-shot text generation with a prompt
- `mlx_lm.chat` for an interactive REPL-style chat session
- `mlx_lm.server` for a local OpenAI-compatible HTTP server
- `mlx_lm.convert` for downloading a Hugging Face model and quantizing it to 2-bit, 3-bit, 4-bit, 6-bit, or 8-bit
- `mlx_lm.lora` for low-rank adapter fine-tuning, including QLoRA on top of a quantized base model
- `mlx_lm.fuse` for merging a trained LoRA adapter back into the base weights
- `mlx_lm.upload` for pushing a converted model to the Hugging Face Hub

Under the hood, `mlx-lm` uses streaming generation with a key-value cache, supports prompt caching to make repeated prefixes free, and offers a rotating KV cache to keep memory bounded for very long contexts. Distributed inference is available via `mx.distributed`, which is how Awni Hannun and others have demonstrated trillion-parameter models like DeepSeek V3 and Kimi K2 Thinking running across clusters of Mac Studios.[15][16]

## The mlx-data subproject

`mlx-data` is a separate library for building data-loading and preprocessing pipelines. It is framework-agnostic, meaning that although it ships from the `ml-explore` organization it can be used with PyTorch or JAX in addition to MLX. The library focuses on streaming, sharding, and on-the-fly transformations such as resizing images, tokenizing text, and decoding audio. It is written in C++ with Python bindings and is designed to keep up with the throughput that Apple silicon GPUs can reach, especially for image and audio workloads.

## The mlx-examples repository

`mlx-examples` is a separate Apple-maintained GitHub repository that ships reference implementations rather than reusable APIs. It is the place most people go first to see how a model is meant to be expressed in MLX.[2]

Notable examples include:

| Domain | Example |
|--------|---------|
| Language | Transformer training, Llama and Mistral inference, Mixtral 8x7B mixture of experts, T5, BERT |
| Vision | ResNet on CIFAR-10, CVAE on MNIST, FLUX and Stable Diffusion (including SDXL) image generation |
| Audio | OpenAI Whisper for speech recognition, Meta EnCodec for audio compression, Meta MusicGen for music generation |
| Video | Wan2.1 for text-to-video and image-to-video |
| Multimodal | CLIP, LLaVA, Segment Anything (SAM) |
| Other | Graph Convolutional Networks, Real NVP normalizing flows, parameter-efficient fine-tuning with LoRA and QLoRA |

These examples are a major reason MLX got traction quickly: a developer who wanted to run Stable Diffusion or Llama on a MacBook could clone the repo and have a working pipeline within an hour of the framework's release.[2]

## How fast is MLX?

MLX is fast on Apple silicon, particularly for inference. The exact numbers depend on chip, model, batch size, and quantization, but a few patterns show up consistently across community benchmarks.

For 7B-class language models in 4-bit quantization, throughput on the GPU runs roughly in the tens of tokens per second on M2 Max and M3 Max systems and climbs into the hundreds on M3 Ultra and M4 Max systems. Independent benchmarks have measured Llama 2 7B in Q4 at around 26 tokens per second on an M3 MacBook Pro and about 115 tokens per second on an M3 Ultra.[13] On the M5, Apple's own research blog reports that the GPU's neural accelerators yield "up to 4x speedup compared to a M4 baseline for time-to-first-token in language model inference," with subsequent token generation 19 to 27 percent faster, driven mostly by the M5's higher memory bandwidth (153 GB/s versus 120 GB/s, about 28 percent more) and the new Metal 4 Tensor Operations.[6] Apple explains the split this way: prompt processing "takes full advantage of the Neural Accelerators," while "generating subsequent tokens is bounded by memory bandwidth, rather than by compute ability."[6]

In head-to-head comparisons, MLX tends to beat PyTorch with the MPS backend for inference and many single-array operations, while PyTorch can still be faster for some training workloads. One frequently cited Towards Data Science benchmark found PyTorch MPS at roughly 10 to 14 seconds per training epoch for a small CNN, with the MLX version of the same model at roughly 21 to 27 seconds, so the framework has not entirely closed the gap on every workload.[12] For inference and serving, MLX is generally the fastest option among Apple silicon native frameworks. The clearest signal came in March 2026, when [Ollama](/wiki/ollama) 0.19 swapped its llama.cpp Metal backend for MLX on Apple silicon, roughly doubling decode speed from about 58 to about 112 tokens per second on qualifying hardware (the MLX backend requires 32 GB or more of unified memory and falls back to llama.cpp Metal below that).[18]

Nvidia GPUs are still faster in absolute terms for large models that fit in their VRAM, but MLX's energy efficiency on Apple hardware is competitive: per-iteration energy on Apple silicon is in the same range as an RTX 4090 and well below an A6000 in some benchmarks.[13]

## How does MLX differ from PyTorch and llama.cpp?

MLX sits in a fairly crowded landscape, but its niche of "first-class Apple silicon framework that looks like NumPy and PyTorch" is well defined. The table below compares it with the frameworks it is most often weighed against.

| Framework | Organization | Focus | Eager or lazy | Multi-platform | Hugging Face support | License | First release |
|-----------|--------------|-------|---------------|----------------|----------------------|---------|---------------|
| MLX | Apple ML Research | Research and on-device development on Apple silicon | Eager with optional `compile` and lazy graph capture | Apple silicon primary; CUDA back-end added later | Yes, via `mlx-lm` and the `mlx-community` organization | MIT | December 2023 |
| PyTorch + MPS | PyTorch Foundation (Linux Foundation) | General research and production deep learning | Eager with `torch.compile` | Cross-platform; MPS back-end since PyTorch 1.12 (2022) | Yes, via Transformers | BSD-3 | 2016 (PyTorch); 2022 (MPS) |
| JAX with Metal | Google plus Apple plugin | Functional, transformation-first numerical computing | Lazy, traced | Cross-platform; Metal back-end is experimental | Limited (some Flax models) | Apache 2.0 | 2018; Metal plugin in 2024 |
| Core ML | Apple | On-device deployment and inference | Static graphs (.mlpackage) | Apple platforms only | Indirect, via converters | Proprietary (free SDK) | 2017 |
| llama.cpp | Open source community (ggml-org) | Quantized LLM inference in plain C++ | Static, hand-written | Cross-platform (CPU, Metal, CUDA, ROCm, Vulkan) | Yes, via GGUF format | MIT | 2023 |
| Apple Foundation Models framework | Apple | Third-party access to Apple's on-device LLM (Apple Intelligence) | Closed-source runtime | Apple platforms only | No | Proprietary | 2024 |
| TinyGrad | Tiny Corp | Minimal research framework, multiple back-ends | Lazy | Cross-platform (CPU, CUDA, Metal, others) | Some | MIT | 2020 |

A quick summary of how to think about each:

- MLX is the choice when you want to write training or inference code that looks like PyTorch but runs natively on Apple silicon, especially if you want to use unified memory and fine-tune on the same machine you developed on.
- PyTorch with MPS is the choice when you want code that also runs on Linux servers with CUDA. The MPS back-end is mature enough to be the default "PyTorch on Mac" experience.
- [Core ML](/wiki/core_ml) is the choice when you are shipping a model inside an iOS or macOS app. It can use the [Apple Neural Engine](/wiki/apple_neural_engine), which MLX cannot.
- [llama.cpp](/wiki/llama_cpp) is the choice when you want maximum portability for LLM inference, especially across non-Apple hardware. Its GGUF file format has become a de facto standard, and as of Ollama 0.19 it covers far more model architectures than the newer MLX runner.[18]
- The Apple Foundation Models framework is the choice when you want to call Apple's own on-device LLM (the one inside [Apple Intelligence](/wiki/apple_intelligence)) without managing weights or quantization yourself.

## What is MLX used for?

MLX has settled into a few clearly identified use cases over its first years:

- Running open-weight LLMs locally on a Mac, often as a faster alternative to llama.cpp or [Ollama](/wiki/ollama) (Ollama itself now supports MLX as a back-end)[18]
- Local fine-tuning, especially with LoRA and QLoRA, so a developer can adapt a 7B or 13B model to their own data overnight on a MacBook Pro
- Research prototyping on Apple silicon, where the unified-memory model lets researchers handle larger batches than the same wattage of a discrete GPU would allow
- Distributed inference and training across multiple Mac Studios, particularly for very large MoE models that do not fit on a single machine[15][16]
- Embedded use in Mac and iOS apps via the Swift API, for example for on-device transcription, image generation, or chat[8]
- Teaching and learning, since the API is simple and the lazy-graph model makes it relatively easy to inspect what a transformation is doing

## Strengths

The main reasons people pick MLX over alternatives:

- Native to Apple silicon, so kernels are written for the actual hardware rather than translated through a portable abstraction layer
- Unified memory removes a class of mistakes around device placement and copies
- API is genuinely familiar: a PyTorch user can read MLX code on the first try, and a NumPy user can write array operations with no extra learning
- MIT license, which means companies can use it without legal friction
- Active maintenance from a small but committed Apple team, with frequent releases
- Excellent inference performance for quantized LLMs on consumer Macs
- Tight integration with Hugging Face through the `mlx-community` organization, which had thousands of pre-converted models by 2026[10][11]

## Weaknesses and limitations

The usual caveats:

- Apple silicon only for the primary back-end; the CUDA back-end exists but is newer and not the focus of development
- Smaller community than PyTorch or JAX, so there are fewer tutorials, fewer Stack Overflow answers, and fewer pretrained MLX-native checkpoints (though Hugging Face conversions help)
- Distributed training is younger than the equivalent in PyTorch (FSDP, DeepSpeed) and is still evolving
- Not the choice for production cloud serving, where vLLM, TensorRT-LLM, and PyTorch dominate
- Does not target the Apple Neural Engine; for that you still need Core ML
- Some training workloads remain faster on PyTorch with MPS, so users sometimes write training in PyTorch and inference in MLX[12]

## How widely is MLX adopted?

MLX has been adopted broadly within the Mac-using ML community. The GitHub repository passed roughly 27,000 stars by June 2026, making it Apple's second most starred and forked open-source project after the Swift programming language; the `mlx-community` Hugging Face organization had thousands of converted model checkpoints with thousands of members, and major tools like Ollama added MLX as an optional back-end.[1][11][18] Apple itself uses MLX internally for some of the work on its on-device foundation models, and Apple's WWDC 2024 and 2025 sessions explicitly recommend MLX for developers who want to fine-tune or experiment with open models on Mac.[9][14]

Outside Apple, MLX shows up in academic work on private LLMs (for example, multi-node mixture-of-experts research using Apple silicon clusters) and in many indie developer tools that ship on-device AI features. Demonstrations of trillion-parameter models running across four Mac Studios with MLX's distributed back-end have generated coverage in the trade press and pushed Apple's small-cluster story as a low-cost alternative to Nvidia DGX boxes for some workloads.[15][16]

In February 2026, Awni Hannun, the public face of MLX, announced that he was leaving Apple, writing "Today is my last day at Apple. Building MLX with our amazing team and community has been an absolute pleasure. It's still early days for AI on Apple silicon."[17] Coverage at the time noted that he was one of roughly a dozen senior AI researchers who had departed Apple in the previous year. Hannun named the engineers continuing to maintain MLX, and the project has continued to ship releases since.[17]

## Where MLX fits in Apple's ML stack

Apple has several machine-learning frameworks, and they serve different jobs.

| Framework | Primary purpose |
|-----------|-----------------|
| Core ML | Deployment and inference of trained models inside iOS, macOS, watchOS, and visionOS apps; uses CPU, GPU, and Apple Neural Engine |
| Create ML | A graphical and Swift API for training simple models from labeled data without writing low-level code |
| MLX | Research and on-device development framework, the most PyTorch-like of Apple's offerings, used for training and fine-tuning |
| Apple Foundation Models framework | Third-party access (in Swift) to Apple's own on-device LLM that powers Apple Intelligence |
| Metal Performance Shaders Graph (MPSGraph) | Lower-level graph compiler for Metal that PyTorch's MPS back-end and other frameworks build on |

In practice, a developer might prototype a model in MLX, convert it to Core ML for shipping in a consumer app, and call Apple's Foundation Models framework when they just need general-purpose text from the system LLM rather than their own custom model.

## Is MLX open source? License and governance

MLX is released under the MIT license, which permits commercial use, modification, distribution, and private use without requiring derivative works to be open-sourced.[1][5] The project is maintained on GitHub under the `ml-explore` organization, which is owned by Apple. Releases are cut directly from the main branch, and contributions follow a standard pull-request workflow. The `mlx-examples`, `mlx-lm`, `mlx-data`, `mlx-swift`, and `mlx-c` repositories use the same MIT license and the same governance model.[5]

## Recent context (2024 to 2026)

By 2026 MLX has settled in as the standard Mac-native ML framework for local LLM development and on-device fine-tuning. The Hugging Face community treats it as a first-class target, alongside the PyTorch and GGUF formats.[10] Apple continues to invest in the project: distributed training and inference across multiple Macs has matured, quantization libraries have grown to include 2-bit and 3-bit variants, the Swift bindings are good enough to ship inside iOS and macOS apps, and Metal 4's Tensor Operations let MLX exploit the new neural accelerators in the M5 GPU.[6] The CUDA back-end added in 2025 also widened MLX's appeal: the same code can prototype on a MacBook and then run on Nvidia hardware in a cloud environment.

The departure of Awni Hannun in early 2026 left an open question about Apple's long-term commitment to MLX, but the project has continued to ship and the rest of the team has remained in place.[17] For most practical purposes, if you want to run or train a model on a Mac in 2026, MLX is the framework people reach for first.

## See also

- [Apple Inc.](/wiki/apple_inc)
- [Apple silicon](/wiki/apple_silicon)
- [Apple Intelligence](/wiki/apple_intelligence)
- [Apple Neural Engine](/wiki/apple_neural_engine)
- [PyTorch](/wiki/pytorch)
- [PyTorch Lightning](/wiki/pytorch_lightning)
- [JAX](/wiki/jax)
- [NumPy](/wiki/numpy)
- [Core ML](/wiki/core_ml)
- [llama.cpp](/wiki/llama_cpp)
- [Ollama](/wiki/ollama)
- [tinygrad](/wiki/tinygrad)
- [Ray (framework)](/wiki/ray)
- [Haystack (framework)](/wiki/haystack)

## References

1. ml-explore. "MLX: An array framework for Apple silicon." GitHub repository. https://github.com/ml-explore/mlx
2. ml-explore. "MLX Examples." GitHub repository. https://github.com/ml-explore/mlx-examples
3. ml-explore. "MLX-LM: Run LLMs with MLX." GitHub repository. https://github.com/ml-explore/mlx-lm
4. ml-explore. "MLX Documentation." https://ml-explore.github.io/mlx/build/html/index.html
5. Apple Open Source. "MLX project page." https://opensource.apple.com/projects/mlx/
6. Apple Machine Learning Research. "Exploring LLMs with MLX and the Neural Accelerators in the M5 GPU." https://machinelearning.apple.com/research/exploring-llms-mlx-m5
7. Awni Hannun. "MLX is an efficient machine learning framework specifically designed for Apple silicon." X (Twitter), December 5, 2023. https://x.com/awnihannun/status/1732184443451019431
8. Swift.org. "On-device ML research with MLX and Swift." Blog post, February 20, 2024. https://www.swift.org/blog/mlx-swift/
9. Apple Developer. "Explore large language models on Apple silicon with MLX." WWDC 2025 session 298. https://developer.apple.com/videos/play/wwdc2025/298/
10. Hugging Face. "Using MLX at Hugging Face." Hub documentation. https://huggingface.co/docs/hub/mlx
11. Hugging Face. "MLX Community organization." https://huggingface.co/mlx-community
12. Mike Cvet. "PyTorch and MLX for Apple Silicon." Towards Data Science. https://towardsdatascience.com/pytorch-and-mlx-for-apple-silicon-4f35b9f60e39/
13. Tristan Bilot. "How Fast Is MLX? A Comprehensive Benchmark on Apple Silicon Chips and CUDA GPUs." Towards Data Science. https://medium.com/data-science/how-fast-is-mlx-a-comprehensive-benchmark-on-8-apple-silicon-chips-and-4-cuda-gpus-378a0ae356a0
14. Apple Machine Learning Research. "Introducing Apple's On-Device and Server Foundation Models." https://machinelearning.apple.com/research/introducing-apple-foundation-models
15. Awni Hannun. "The latest MLX is out! And it has a new distributed back-end (JACCL) that uses RDMA over TB5." X (Twitter), 2025. https://x.com/awnihannun/status/2001667839539978580
16. Simon Willison. "Run DeepSeek R1 or V3 with MLX Distributed." Blog post, January 22, 2025. https://simonwillison.net/2025/Jan/22/mlx-distributed/
17. Awni Hannun. "Today is my last day at Apple." X (Twitter), February 27, 2026. https://x.com/awnihannun/status/2027506228143030480
18. Ollama. "Ollama is now powered by MLX on Apple Silicon in preview." Blog post, March 2026. https://ollama.com/blog/mlx
19. Apple Inc. "Apple introduces Apple Intelligence." WWDC 2024 announcement coverage.
20. Computerworld. "Apple launches MLX machine-learning framework for Apple Silicon." December 2023. https://www.computerworld.com/article/1611155/apple-launches-mlx-machine-learning-framework-for-apple-silicon.html