MLX

AI Companies AI Hardware Developer Tools Open Source AI

23 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

20 citations

Revision

v2 · 4,614 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

MLX is an open-source array and machine-learning framework built by Apple Inc.'s machine-learning research team specifically for Apple silicon, the M1, M2, M3, M4, and M5 series of system-on-chip processors. First released on GitHub on December 5, 2023 under the permissive MIT license, MLX pairs a NumPy-like Python API for arrays with a PyTorch-like API for neural networks, and it exploits the unified-memory architecture of Apple silicon so that arrays are shared between CPU and GPU without explicit data copies.^[1]^[20] The framework's official description is that "MLX is an array framework for machine learning on Apple silicon, brought to you by Apple machine learning research."^[1] By June 2026 the project had passed 27,000 stars on GitHub, making it Apple's second most starred open-source project after the Swift programming language.^[1]

MLX was conceived as a research-friendly framework that closes the gap between CPU and GPU programming on the Mac. On most other platforms, an ML developer has to think about which device a tensor lives on and explicitly move it across an interconnect such as PCIe. On Apple silicon, CPU and GPU share the same physical memory, so the framework can avoid those copies entirely. MLX bakes that assumption into its core, which is what makes it noticeably different from cross-platform frameworks like PyTorch with the Metal Performance Shaders backend or JAX with its experimental Metal backend.^[1]

Since the initial release, MLX has grown into a small ecosystem of related projects: the core mlx array library, the mlx-lm package for large language model inference and fine-tuning, the mlx-data library for data loading, the mlx-swift package that exposes the API to Swift and iOS, and the mlx-examples repository that hosts reference implementations of models such as Llama, Mistral, Mixtral, Stable Diffusion, Whisper, and many others.^[2]^[3] By 2026 the MLX Community organization on Hugging Face hosted thousands of pre-converted model weights.^[11]

When was MLX released?

MLX appeared during a period when running large language models locally on a Mac was becoming popular but was mostly limited to projects like llama.cpp that targeted the CPU plus a thin Metal layer. Apple's ML research division wanted a framework that researchers inside and outside the company could actually train and prototype with on consumer hardware, not just deploy to. The result was MLX, released on December 5, 2023 alongside the mlx-examples repository so that the framework would launch with working reference code rather than as a bare runtime.^[2]^[20]

Date	Event
December 5, 2023	Initial open-source release of MLX and mlx-examples on GitHub by Apple's machine-learning research team; announced by Awni Hannun on X^[7]
Late December 2023	Quick wave of community ports: Llama 2, Mistral 7B, Stable Diffusion, and Whisper inference examples appear in mlx-examples
January 2024	Optimized inference paths for Llama 2, Mistral, and Qwen released; LoRA fine-tuning example added
February 20, 2024	Release of MLX Swift, a Swift API to MLX with neural-network and optimizer packages plus Mistral 7B and MNIST examples^[8]
2024 (spring and summer)	Quantization support (4-bit and 8-bit), faster Metal kernels, and an `mx.compile` tracing compiler are added
May 2024	Apple ships the M4 in the iPad Pro; MLX runs on M4 immediately because it targets the Apple silicon Metal stack rather than a specific chip generation
June 2024	At WWDC 2024 Apple announces Apple Intelligence and the Foundation Models framework; Apple highlights MLX as the recommended framework for fine-tuning and experimenting with open models on Mac^[14]
Late 2024	Distributed training and inference across multiple Macs over Thunderbolt or Ethernet land in MLX
2024 to 2025	The `mlx-lm` package matures into the standard CLI for running LLMs on Mac; thousands of pre-quantized models appear under the `mlx-community` organization on Hugging Face^[11]
2025	Apple ships the M5 with dedicated neural accelerators in the GPU; MLX 0.2x and 0.3x releases add Metal 4 tensor-operation support and large speedups for time-to-first-token^[6]
Mid-2025	A CUDA back-end for MLX is introduced (`pip install "mlx[cuda]"`), letting code developed on Apple silicon also run on Nvidia GPUs
Late 2025	JACCL distributed back-end uses RDMA over Thunderbolt 5 for low-latency multi-Mac clusters; demonstrations include trillion-parameter models running across four Mac Studios^[15]^[16]
April 2026	MLX 0.31.2 documented as the current release^[4]
June 2026	Project crosses about 27,000 stars on GitHub^[1]

Who created MLX?

MLX was created by a small team inside Apple's machine-learning research division. The four engineers credited with equal contribution on the initial release are Awni Hannun, Jagrit Digani, Angelos Katharopoulos, and Ronan Collobert.^[1]

Awni Hannun acted as the public face of the project for most of its first three years. Before joining Apple, he worked at Facebook AI Research and earned his PhD at Stanford University, where he co-authored the Deep Speech speech-recognition papers in Andrew Ng's group. He announced the original release on X on December 5, 2023, writing that "MLX is an efficient machine learning framework specifically designed for Apple silicon (i.e. your laptop!)" and continued posting roadmap updates, demos, and benchmarks for the next several years.^[7] Hannun left Apple in February 2026 but, in his farewell post on February 27, 2026, named the people continuing to maintain MLX (Angelos Katharopoulos, Cheng Zhao, Jagrit Digani, and others) and said the project would keep moving.^[17]

Ronan Collobert is a longtime researcher in deep learning who previously worked at Facebook AI Research and on the Torch and Flashlight frameworks. Angelos Katharopoulos is known for work on linear-attention transformers and contributes heavily to MLX's compiler and distributed back-ends. Jagrit Digani has focused on Metal kernels and operator coverage. Outside contributions come from a wide community of Mac-using ML developers, with the GitHub project showing thousands of merged pull requests across its first three years.^[1]

How does MLX work? Architecture and design philosophy

MLX is built around a small set of design decisions that, taken together, distinguish it from PyTorch, JAX, and TensorFlow. The most consequential is unified memory: there is no .to("cuda") and no separate host and device tensor, because on Apple silicon the CPU, GPU, and Neural Engine all address the same physical RAM. Operations specify the device they should run on, but the data does not move.^[1]

Design decision	Description
Unified memory	Arrays live in shared memory and are visible to CPU and GPU without copies. Specifying a device chooses where the kernel runs, not where the data lives.
Lazy evaluation	Operations record nodes in a computation graph; nothing executes until a value is actually needed (for example via `mx.eval` or by printing or saving an array). This is similar in spirit to JAX.
Composable function transformations	`mlx.grad`, `mlx.value_and_grad`, `mlx.vmap`, `mlx.compile`, `mlx.jvp`, and `mlx.vjp` can be stacked, so a user can take the gradient of a vmapped, JIT-compiled function.
Familiar Python API	`mlx.core` mirrors NumPy in array creation, slicing, broadcasting, and reductions; `mlx.nn` mirrors the PyTorch `nn.Module` style of subclassing.
Multi-language bindings	Python, C++, C, and Swift APIs all wrap the same C++ core, with the C layer acting as a stable bridge.
Dynamic graphs with optional compilation	Code is eager by default; `mx.compile` traces and fuses operations to remove Python overhead and improve performance.
Native Metal kernels	Many operators ship hand-written Metal shaders rather than relying on a vendor BLAS, which lets MLX exploit details of the Apple GPU like SIMD groups and tile memory.
Multi-backend	CPU and Apple GPU are the primary back-ends; later releases added a CUDA back-end for Nvidia hardware and a JACCL distributed back-end.

The official documentation summarizes the model in one line: "Computations in MLX are lazy. Arrays are only materialized when needed."^[4] Lazy evaluation matters because it lets MLX fuse operators and skip work whose results are never read. It also makes function transformations cleaner, since the framework already has a graph to differentiate, vectorize, or compile, and MLX allows arbitrary compositions such as grad(vmap(grad(fn))).^[1]^[4] The trade-off is that the user has to remember to call mx.eval (or perform any operation that materializes a value) when they want a side effect like printing or timing.

The Python package mlx.nn defines familiar building blocks: Linear, Embedding, Conv1d, Conv2d, Conv3d, LSTM, GRU, Transformer, MultiHeadAttention, RMSNorm, LayerNorm, RoPE, and the usual activation functions. The mlx.optimizers package provides SGD, Adam, AdamW, Lion, Muon, and Adafactor, plus learning-rate schedules such as cosine decay, exponential decay, and linear schedules. Anyone who has used PyTorch can read MLX neural-network code without much trouble.^[1]

What are MLX's key features?

The headline features that come up most often in tutorials and benchmarks are listed below.

Feature	What it does
Lazy execution	Defers computation until a result is needed, enabling fusion and skipping unused work
Function transformations	`grad`, `value_and_grad`, `jit` (`compile`), `vmap`, and `jvp`/`vjp` for forward and reverse mode autodiff
Module-style neural networks	`mlx.nn.Module` subclasses with `__call__`, parameter pytrees, and easy state-dict style serialization
Optimizers	SGD, Adam, AdamW, Lion, Muon, Adafactor, with weight decay and gradient clipping helpers
Quantization	4-bit and 8-bit weight quantization for inference, with `mx.quantize` and `mx.dequantize` primitives and a CLI in `mlx-lm`
LoRA and QLoRA fine-tuning	`mlx-lm.lora` trains low-rank adapters on top of a frozen base model; adapters can be fused back for deployment
Distributed training and inference	`mx.distributed` with `all_reduce`, `all_gather`, `send`, and `recv` over MPI, Ring, or the JACCL back-end on Thunderbolt 5
Hugging Face integration	`mlx_lm.convert` downloads a Hugging Face model, quantizes it, and saves it locally or uploads to the `mlx-community` organization
Streaming generation and prompt caching	Token-by-token generation with reusable KV caches and rotating caches for long contexts
Multi-language bindings	Python, C++, C, and Swift; a Go community port also exists

One practical detail that surprises new users is that MLX does not use the Apple Neural Engine (Apple Neural Engine, or ANE) directly. The ANE is exposed through Core ML, and MLX runs on the CPU and GPU instead. From the M5 onward, however, the GPU itself contains neural accelerators that provide matrix-multiplication primitives, and MLX uses those through Metal 4's Tensor Operations.^[6]

What is mlx-lm used for?

mlx-lm is the package most users actually install. It is a high-level Python library and command-line interface for running, quantizing, and fine-tuning large language models on Apple silicon. The default install pulls Llama-3.2-3B-Instruct in a 4-bit quantized form, so a user can type pip install mlx-lm followed by mlx_lm.chat and get a working chatbot in a few minutes on any modern Mac.^[3]

The package supports thousands of open-weight models on the Hugging Face Hub. The most commonly used families include Llama 1, Llama 2, Llama 3, Llama 3.1, Llama 3.2, Llama 4, Mistral 7B, Mixtral 8x7B and 8x22B, Qwen 1.5, Qwen2, Qwen2.5, Qwen3, Phi-2 and Phi-3, Gemma and Gemma 2, DeepSeek (including DeepSeek R1 and V3 in distributed mode), Yi, Stable LM, and Falcon. Vision-language models are covered by the related community package mlx-vlm.^[3]^[11]

The mlx-lm CLI exposes the main workflows directly:

mlx_lm.generate for one-shot text generation with a prompt
mlx_lm.chat for an interactive REPL-style chat session
mlx_lm.server for a local OpenAI-compatible HTTP server
mlx_lm.convert for downloading a Hugging Face model and quantizing it to 2-bit, 3-bit, 4-bit, 6-bit, or 8-bit
mlx_lm.lora for low-rank adapter fine-tuning, including QLoRA on top of a quantized base model
mlx_lm.fuse for merging a trained LoRA adapter back into the base weights
mlx_lm.upload for pushing a converted model to the Hugging Face Hub

Under the hood, mlx-lm uses streaming generation with a key-value cache, supports prompt caching to make repeated prefixes free, and offers a rotating KV cache to keep memory bounded for very long contexts. Distributed inference is available via mx.distributed, which is how Awni Hannun and others have demonstrated trillion-parameter models like DeepSeek V3 and Kimi K2 Thinking running across clusters of Mac Studios.^[15]^[16]

The mlx-data subproject

mlx-data is a separate library for building data-loading and preprocessing pipelines. It is framework-agnostic, meaning that although it ships from the ml-explore organization it can be used with PyTorch or JAX in addition to MLX. The library focuses on streaming, sharding, and on-the-fly transformations such as resizing images, tokenizing text, and decoding audio. It is written in C++ with Python bindings and is designed to keep up with the throughput that Apple silicon GPUs can reach, especially for image and audio workloads.

The mlx-examples repository

mlx-examples is a separate Apple-maintained GitHub repository that ships reference implementations rather than reusable APIs. It is the place most people go first to see how a model is meant to be expressed in MLX.^[2]

Notable examples include:

Domain	Example
Language	Transformer training, Llama and Mistral inference, Mixtral 8x7B mixture of experts, T5, BERT
Vision	ResNet on CIFAR-10, CVAE on MNIST, FLUX and Stable Diffusion (including SDXL) image generation
Audio	OpenAI Whisper for speech recognition, Meta EnCodec for audio compression, Meta MusicGen for music generation
Video	Wan2.1 for text-to-video and image-to-video
Multimodal	CLIP, LLaVA, Segment Anything (SAM)
Other	Graph Convolutional Networks, Real NVP normalizing flows, parameter-efficient fine-tuning with LoRA and QLoRA

These examples are a major reason MLX got traction quickly: a developer who wanted to run Stable Diffusion or Llama on a MacBook could clone the repo and have a working pipeline within an hour of the framework's release.^[2]

How fast is MLX?

MLX is fast on Apple silicon, particularly for inference. The exact numbers depend on chip, model, batch size, and quantization, but a few patterns show up consistently across community benchmarks.

For 7B-class language models in 4-bit quantization, throughput on the GPU runs roughly in the tens of tokens per second on M2 Max and M3 Max systems and climbs into the hundreds on M3 Ultra and M4 Max systems. Independent benchmarks have measured Llama 2 7B in Q4 at around 26 tokens per second on an M3 MacBook Pro and about 115 tokens per second on an M3 Ultra.^[13] On the M5, Apple's own research blog reports that the GPU's neural accelerators yield "up to 4x speedup compared to a M4 baseline for time-to-first-token in language model inference," with subsequent token generation 19 to 27 percent faster, driven mostly by the M5's higher memory bandwidth (153 GB/s versus 120 GB/s, about 28 percent more) and the new Metal 4 Tensor Operations.^[6] Apple explains the split this way: prompt processing "takes full advantage of the Neural Accelerators," while "generating subsequent tokens is bounded by memory bandwidth, rather than by compute ability."^[6]

In head-to-head comparisons, MLX tends to beat PyTorch with the MPS backend for inference and many single-array operations, while PyTorch can still be faster for some training workloads. One frequently cited Towards Data Science benchmark found PyTorch MPS at roughly 10 to 14 seconds per training epoch for a small CNN, with the MLX version of the same model at roughly 21 to 27 seconds, so the framework has not entirely closed the gap on every workload.^[12] For inference and serving, MLX is generally the fastest option among Apple silicon native frameworks. The clearest signal came in March 2026, when Ollama 0.19 swapped its llama.cpp Metal backend for MLX on Apple silicon, roughly doubling decode speed from about 58 to about 112 tokens per second on qualifying hardware (the MLX backend requires 32 GB or more of unified memory and falls back to llama.cpp Metal below that).^[18]

Nvidia GPUs are still faster in absolute terms for large models that fit in their VRAM, but MLX's energy efficiency on Apple hardware is competitive: per-iteration energy on Apple silicon is in the same range as an RTX 4090 and well below an A6000 in some benchmarks.^[13]

How does MLX differ from PyTorch and llama.cpp?

MLX sits in a fairly crowded landscape, but its niche of "first-class Apple silicon framework that looks like NumPy and PyTorch" is well defined. The table below compares it with the frameworks it is most often weighed against.

Framework	Organization	Focus	Eager or lazy	Multi-platform	Hugging Face support	License	First release
MLX	Apple ML Research	Research and on-device development on Apple silicon	Eager with optional `compile` and lazy graph capture	Apple silicon primary; CUDA back-end added later	Yes, via `mlx-lm` and the `mlx-community` organization	MIT	December 2023
PyTorch + MPS	PyTorch Foundation (Linux Foundation)	General research and production deep learning	Eager with `torch.compile`	Cross-platform; MPS back-end since PyTorch 1.12 (2022)	Yes, via Transformers	BSD-3	2016 (PyTorch); 2022 (MPS)
JAX with Metal	Google plus Apple plugin	Functional, transformation-first numerical computing	Lazy, traced	Cross-platform; Metal back-end is experimental	Limited (some Flax models)	Apache 2.0	2018; Metal plugin in 2024
Core ML	Apple	On-device deployment and inference	Static graphs (.mlpackage)	Apple platforms only	Indirect, via converters	Proprietary (free SDK)	2017
llama.cpp	Open source community (ggml-org)	Quantized LLM inference in plain C++	Static, hand-written	Cross-platform (CPU, Metal, CUDA, ROCm, Vulkan)	Yes, via GGUF format	MIT	2023
Apple Foundation Models framework	Apple	Third-party access to Apple's on-device LLM (Apple Intelligence)	Closed-source runtime	Apple platforms only	No	Proprietary	2024
TinyGrad	Tiny Corp	Minimal research framework, multiple back-ends	Lazy	Cross-platform (CPU, CUDA, Metal, others)	Some	MIT	2020

A quick summary of how to think about each:

MLX is the choice when you want to write training or inference code that looks like PyTorch but runs natively on Apple silicon, especially if you want to use unified memory and fine-tune on the same machine you developed on.
PyTorch with MPS is the choice when you want code that also runs on Linux servers with CUDA. The MPS back-end is mature enough to be the default "PyTorch on Mac" experience.
Core ML is the choice when you are shipping a model inside an iOS or macOS app. It can use the Apple Neural Engine, which MLX cannot.
llama.cpp is the choice when you want maximum portability for LLM inference, especially across non-Apple hardware. Its GGUF file format has become a de facto standard, and as of Ollama 0.19 it covers far more model architectures than the newer MLX runner.^[18]
The Apple Foundation Models framework is the choice when you want to call Apple's own on-device LLM (the one inside Apple Intelligence) without managing weights or quantization yourself.

What is MLX used for?

MLX has settled into a few clearly identified use cases over its first years:

Running open-weight LLMs locally on a Mac, often as a faster alternative to llama.cpp or Ollama (Ollama itself now supports MLX as a back-end)^[18]
Local fine-tuning, especially with LoRA and QLoRA, so a developer can adapt a 7B or 13B model to their own data overnight on a MacBook Pro
Research prototyping on Apple silicon, where the unified-memory model lets researchers handle larger batches than the same wattage of a discrete GPU would allow
Distributed inference and training across multiple Mac Studios, particularly for very large MoE models that do not fit on a single machine^[15]^[16]
Embedded use in Mac and iOS apps via the Swift API, for example for on-device transcription, image generation, or chat^[8]
Teaching and learning, since the API is simple and the lazy-graph model makes it relatively easy to inspect what a transformation is doing

Strengths

The main reasons people pick MLX over alternatives:

Native to Apple silicon, so kernels are written for the actual hardware rather than translated through a portable abstraction layer
Unified memory removes a class of mistakes around device placement and copies
API is genuinely familiar: a PyTorch user can read MLX code on the first try, and a NumPy user can write array operations with no extra learning
MIT license, which means companies can use it without legal friction
Active maintenance from a small but committed Apple team, with frequent releases
Excellent inference performance for quantized LLMs on consumer Macs
Tight integration with Hugging Face through the mlx-community organization, which had thousands of pre-converted models by 2026^[10]^[11]

Weaknesses and limitations

The usual caveats:

Apple silicon only for the primary back-end; the CUDA back-end exists but is newer and not the focus of development
Smaller community than PyTorch or JAX, so there are fewer tutorials, fewer Stack Overflow answers, and fewer pretrained MLX-native checkpoints (though Hugging Face conversions help)
Distributed training is younger than the equivalent in PyTorch (FSDP, DeepSpeed) and is still evolving
Not the choice for production cloud serving, where vLLM, TensorRT-LLM, and PyTorch dominate
Does not target the Apple Neural Engine; for that you still need Core ML
Some training workloads remain faster on PyTorch with MPS, so users sometimes write training in PyTorch and inference in MLX^[12]

How widely is MLX adopted?

MLX has been adopted broadly within the Mac-using ML community. The GitHub repository passed roughly 27,000 stars by June 2026, making it Apple's second most starred and forked open-source project after the Swift programming language; the mlx-community Hugging Face organization had thousands of converted model checkpoints with thousands of members, and major tools like Ollama added MLX as an optional back-end.^[1]^[11]^[18] Apple itself uses MLX internally for some of the work on its on-device foundation models, and Apple's WWDC 2024 and 2025 sessions explicitly recommend MLX for developers who want to fine-tune or experiment with open models on Mac.^[9]^[14]

Outside Apple, MLX shows up in academic work on private LLMs (for example, multi-node mixture-of-experts research using Apple silicon clusters) and in many indie developer tools that ship on-device AI features. Demonstrations of trillion-parameter models running across four Mac Studios with MLX's distributed back-end have generated coverage in the trade press and pushed Apple's small-cluster story as a low-cost alternative to Nvidia DGX boxes for some workloads.^[15]^[16]

In February 2026, Awni Hannun, the public face of MLX, announced that he was leaving Apple, writing "Today is my last day at Apple. Building MLX with our amazing team and community has been an absolute pleasure. It's still early days for AI on Apple silicon."^[17] Coverage at the time noted that he was one of roughly a dozen senior AI researchers who had departed Apple in the previous year. Hannun named the engineers continuing to maintain MLX, and the project has continued to ship releases since.^[17]

Where MLX fits in Apple's ML stack

Apple has several machine-learning frameworks, and they serve different jobs.

Framework	Primary purpose
Core ML	Deployment and inference of trained models inside iOS, macOS, watchOS, and visionOS apps; uses CPU, GPU, and Apple Neural Engine
Create ML	A graphical and Swift API for training simple models from labeled data without writing low-level code
MLX	Research and on-device development framework, the most PyTorch-like of Apple's offerings, used for training and fine-tuning
Apple Foundation Models framework	Third-party access (in Swift) to Apple's own on-device LLM that powers Apple Intelligence
Metal Performance Shaders Graph (MPSGraph)	Lower-level graph compiler for Metal that PyTorch's MPS back-end and other frameworks build on

In practice, a developer might prototype a model in MLX, convert it to Core ML for shipping in a consumer app, and call Apple's Foundation Models framework when they just need general-purpose text from the system LLM rather than their own custom model.

Is MLX open source? License and governance

MLX is released under the MIT license, which permits commercial use, modification, distribution, and private use without requiring derivative works to be open-sourced.^[1]^[5] The project is maintained on GitHub under the ml-explore organization, which is owned by Apple. Releases are cut directly from the main branch, and contributions follow a standard pull-request workflow. The mlx-examples, mlx-lm, mlx-data, mlx-swift, and mlx-c repositories use the same MIT license and the same governance model.^[5]

Recent context (2024 to 2026)

By 2026 MLX has settled in as the standard Mac-native ML framework for local LLM development and on-device fine-tuning. The Hugging Face community treats it as a first-class target, alongside the PyTorch and GGUF formats.^[10] Apple continues to invest in the project: distributed training and inference across multiple Macs has matured, quantization libraries have grown to include 2-bit and 3-bit variants, the Swift bindings are good enough to ship inside iOS and macOS apps, and Metal 4's Tensor Operations let MLX exploit the new neural accelerators in the M5 GPU.^[6] The CUDA back-end added in 2025 also widened MLX's appeal: the same code can prototype on a MacBook and then run on Nvidia hardware in a cloud environment.

The departure of Awni Hannun in early 2026 left an open question about Apple's long-term commitment to MLX, but the project has continued to ship and the rest of the team has remained in place.^[17] For most practical purposes, if you want to run or train a model on a Mac in 2026, MLX is the framework people reach for first.

References

ml-explore. "MLX: An array framework for Apple silicon." GitHub repository. https://github.com/ml-explore/mlx ↩
ml-explore. "MLX Examples." GitHub repository. https://github.com/ml-explore/mlx-examples ↩
ml-explore. "MLX-LM: Run LLMs with MLX." GitHub repository. https://github.com/ml-explore/mlx-lm ↩
ml-explore. "MLX Documentation." https://ml-explore.github.io/mlx/build/html/index.html ↩
Apple Open Source. "MLX project page." https://opensource.apple.com/projects/mlx/ ↩
Apple Machine Learning Research. "Exploring LLMs with MLX and the Neural Accelerators in the M5 GPU." https://machinelearning.apple.com/research/exploring-llms-mlx-m5 ↩
Awni Hannun. "MLX is an efficient machine learning framework specifically designed for Apple silicon." X (Twitter), December 5, 2023. https://x.com/awnihannun/status/1732184443451019431 ↩
Swift.org. "On-device ML research with MLX and Swift." Blog post, February 20, 2024. https://www.swift.org/blog/mlx-swift/ ↩
Apple Developer. "Explore large language models on Apple silicon with MLX." WWDC 2025 session 298. https://developer.apple.com/videos/play/wwdc2025/298/ ↩
Hugging Face. "Using MLX at Hugging Face." Hub documentation. https://huggingface.co/docs/hub/mlx ↩
Hugging Face. "MLX Community organization." https://huggingface.co/mlx-community ↩
Mike Cvet. "PyTorch and MLX for Apple Silicon." Towards Data Science. https://towardsdatascience.com/pytorch-and-mlx-for-apple-silicon-4f35b9f60e39/ ↩
Tristan Bilot. "How Fast Is MLX? A Comprehensive Benchmark on Apple Silicon Chips and CUDA GPUs." Towards Data Science. https://medium.com/data-science/how-fast-is-mlx-a-comprehensive-benchmark-on-8-apple-silicon-chips-and-4-cuda-gpus-378a0ae356a0 ↩
Apple Machine Learning Research. "Introducing Apple's On-Device and Server Foundation Models." https://machinelearning.apple.com/research/introducing-apple-foundation-models ↩
Awni Hannun. "The latest MLX is out! And it has a new distributed back-end (JACCL) that uses RDMA over TB5." X (Twitter), 2025. https://x.com/awnihannun/status/2001667839539978580 ↩
Simon Willison. "Run DeepSeek R1 or V3 with MLX Distributed." Blog post, January 22, 2025. https://simonwillison.net/2025/Jan/22/mlx-distributed/ ↩
Awni Hannun. "Today is my last day at Apple." X (Twitter), February 27, 2026. https://x.com/awnihannun/status/2027506228143030480 ↩
Ollama. "Ollama is now powered by MLX on Apple Silicon in preview." Blog post, March 2026. https://ollama.com/blog/mlx ↩
Apple Inc. "Apple introduces Apple Intelligence." WWDC 2024 announcement coverage.
Computerworld. "Apple launches MLX machine-learning framework for Apple Silicon." December 2023. https://www.computerworld.com/article/1611155/apple-launches-mlx-machine-learning-framework-for-apple-silicon.html ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Abbreviations Apple Inc.Apple Silicon Best Local and On-Device LLMs Dev tools Kyutai LLM inference engine LM Studio NVIDIA Digits Outlines (library)ThunderKittens llama.cpp

When was MLX released?

Who created MLX?

How does MLX work? Architecture and design philosophy

What are MLX's key features?

What is mlx-lm used for?

The mlx-data subproject

The mlx-examples repository

How fast is MLX?

How does MLX differ from PyTorch and llama.cpp?

What is MLX used for?

Strengths

Weaknesses and limitations

How widely is MLX adopted?

Where MLX fits in Apple's ML stack

Is MLX open source? License and governance

Recent context (2024 to 2026)

See also

References

Improve this article

Related Articles

Core ML

Hugging Face

Supabase

Mem0

LanceDB

Letta (MemGPT)

What links here

Related Articles

Core ML

Hugging Face

Supabase

Mem0

LanceDB

Letta (MemGPT)

What links here