Apple Neural Engine
Last reviewed
May 1, 2026
Sources
30 citations
Review status
Source-backed
Revision
v1 ยท 3,759 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 1, 2026
Sources
30 citations
Review status
Source-backed
Revision
v1 ยท 3,759 words
Add missing citations, update stale details, or suggest a clearer explanation.
The Apple Neural Engine (ANE), sometimes referred to internally as the Neural Processing Unit (NPU), is the dedicated neural network accelerator that Apple builds into its custom system-on-chip silicon. It first appeared in the A11 Bionic chip inside the iPhone 8, iPhone 8 Plus, and iPhone X on September 12, 2017, and has since shipped in every flagship Apple-designed SoC for iPhone, iPad, Mac, Apple Vision Pro, Apple Watch, and HomePod [1][2]. The ANE accelerates inference for machine learning workloads such as image classification, object detection, natural language processing, speech recognition, and, since 2024, on-device large language model inference. Compared with running the same models on the CPU or GPU alone, the ANE typically delivers an order of magnitude better performance per watt, which is the central reason it exists [3][4].
Developers do not target the ANE directly. Instead, they ship Core ML models built with the Python coremltools library, and the Core ML runtime decides at load time which subgraphs of the model to dispatch to the ANE, the GPU, or the CPU based on the operations supported, the precision required, and the device's current thermal and power state [5]. With the launch of Apple Intelligence at WWDC on June 10, 2024, the ANE became the execution target for a roughly three billion parameter on-device foundation model that powers writing tools, notification summaries, the rebuilt Siri, Genmoji, and Image Playground on supported iPhone, iPad, and Mac hardware [6][7].
Apple's neural accelerator program traces back to a small silicon team that began work on a custom inference engine for Face ID in the mid-2010s. The first hardware shipped to consumers in late 2017 and has been iterated every year, with the macro story being a steady increase in core count, throughput, and supported numerical precisions, plus closer integration with transformer workloads from 2022 onward [3][8].
| Year | Chip | First device | ANE cores | Peak throughput | Notable additions |
|---|---|---|---|---|---|
| 2017 | A11 Bionic | iPhone 8, 8 Plus, iPhone X | 2 | 0.6 TOPS | First Neural Engine; powers Face ID and Animoji [1][9] |
| 2018 | A12 Bionic | iPhone XS, XS Max, XR | 8 | 5 TOPS | First ANE accessible to third-party apps via Core ML 2 [10] |
| 2019 | A13 Bionic | iPhone 11, 11 Pro | 8 | ~6 TOPS | Adds AMX matrix coprocessors on the CPU side; Deep Fusion photography [11] |
| 2020 | A14 Bionic | iPhone 12 series, iPad Air 4 | 16 | 11 TOPS | First 16-core ANE; first 5 nm chip [12] |
| 2020 | M1 | MacBook Air, MacBook Pro 13, Mac mini, iMac, iPad Pro | 16 | 11 TOPS | First Apple silicon Mac chip; same ANE as A14 [13] |
| 2021 | A15 Bionic | iPhone 13 series, iPad mini 6 | 16 | 15.8 TOPS | Roughly 43% faster than A14 [14] |
| 2021 | M1 Pro / M1 Max | MacBook Pro 14, MacBook Pro 16 | 16 | 11 TOPS | Same ANE as M1, paired with much larger CPU/GPU and memory bandwidth [15] |
| 2022 | M1 Ultra | Mac Studio | 32 | ~22 TOPS | Two M1 Max dies fused via UltraFusion, doubling the ANE [15] |
| 2022 | M2 | MacBook Air, 13-inch MacBook Pro, Mac mini, Vision Pro | 16 | 15.8 TOPS | Roughly 40% faster than M1 [16] |
| 2022 | A16 Bionic | iPhone 14 Pro, 14 Pro Max | 16 | ~17 TOPS | Driven hard by computational photography (Photonic Engine) [17] |
| 2023 | M2 Pro / M2 Max | MacBook Pro 14/16, Mac mini Pro | 16 | 15.8 TOPS | Same ANE as M2 [18] |
| 2023 | M2 Ultra | Mac Studio, Mac Pro | 32 | ~31.6 TOPS | UltraFusion variant of M2 [18] |
| 2023 | A17 Pro | iPhone 15 Pro, 15 Pro Max, iPad mini 7 | 16 | 35 TOPS | First 3 nm Apple SoC; ~2x faster ANE; minimum requirement for Apple Intelligence on iPhone [19] |
| 2023 | M3 / M3 Pro / M3 Max | MacBook Pro, iMac (24-inch) | 16 | 18 TOPS | Up to 60% faster ANE than M1; first 3 nm Mac chip [20] |
| 2024 | M4 | iPad Pro (May 2024) | 16 | 38 TOPS | First chip Apple marketed as ready for Apple Intelligence [21] |
| 2024 | A18 / A18 Pro | iPhone 16, 16 Plus, 16 Pro, 16 Pro Max, iPhone 16e | 16 | 35 TOPS | Apple Intelligence baked into the launch software [22] |
| 2024 | M4 Pro / M4 Max | MacBook Pro 14/16, Mac mini, iMac | 16 | 38 TOPS | Same ANE as M4 [23] |
| 2025 | M3 Ultra | Mac Studio (March 2025) | 32 | ~36 TOPS | UltraFusion variant of M3, paired with up to 512 GB unified memory [24] |
The ANE has roughly doubled in throughput at each major step. Apple itself frames the M4 as a 60x increase over the original A11 ANE, and the A18's 35 TOPS as around 58x the A11 figure [21][22]. Because TOPS is reported at INT8 on most modern Apple SoCs, the chart above is broadly comparable across generations from A14 onward, but the early A11 figure is sometimes quoted in mixed precision and should be read as approximate [3].
The ANE is a custom Apple design built into the same SoC die as the CPU, GPU, image signal processor, and Secure Enclave. It is fabricated by TSMC on the same leading-edge process node as the rest of the chip: 10 nm for A11, 7 nm for A12 and A13, 5 nm for A14 through A16 and M1 through M3, and 3 nm for A17 Pro, A18, and M4 [1][12][19].
While Apple has never published a full instruction set or microarchitecture diagram, third-party reverse engineering, an internal Apple paper on transformer optimisation, and the open apple/ml-ane-transformers repository together describe a fairly consistent picture [4][8]:
In practice, the ANE is a feed-forward inference accelerator. It is not designed for training. Backpropagation, gradient computation, and large activation buffers fall back to the GPU or CPU when developers experiment with on-device training, and Apple has so far positioned all the ANE-optimised foundation models as inference-only with adapter layers updated separately [6].
The public API for the ANE is Core ML, Apple's machine learning framework introduced at WWDC 2017 alongside the A11 Bionic. Developers convert models from PyTorch, TensorFlow, JAX, ONNX, or scikit-learn into the .mlpackage (or older .mlmodel) format using the Python coremltools package [5]. At build time, Xcode compiles .mlpackage files into .mlmodelc bundles that are shipped inside an app or downloaded on demand.
When the model is loaded, the Core ML runtime profiles the operation graph, partitions it across the available compute units, and dispatches each subgraph to the ANE, the GPU, or the CPU based on three factors: which ops are supported by the ANE on that device, how the precision and shape constraints are met, and the current power and thermal state. Developers can hint a preferred compute unit (computeUnits = .all, .cpuAndNeuralEngine, .cpuAndGPU, or .cpuOnly) but cannot otherwise force the runtime's hand [5]. This is by design: the same model binary can run efficiently across iPhone, iPad, Mac, Vision Pro, and Apple Watch without recompilation.
For transformer workloads, Apple released ane_transformers in 2022, an open reference PyTorch implementation that uses the data layout and chunking tricks described above. On an iPhone 13, a DistilBERT model rewritten with ane_transformers runs about 10x faster and uses about 14x less peak memory than a stock implementation, and lands inference at roughly 3.47 ms per 128-token sequence [4][8]. Other research projects, including the Orion framework from Cornell University, characterise the ANE in more detail and explore using it for training as well as inference [4].
Raw TOPS captures only one slice of ANE performance, since real workloads include memory accesses, host-device synchronisation, and operator coverage. Apple's per-generation marketing numbers, which describe relative speed-ups on internal benchmarks, line up with what third-party tests have reproduced:
Energy efficiency is the lever Apple emphasises most. Internal Apple slides for M4 claim equivalent performance to other thin-and-light laptop processors at roughly one-quarter the power draw, and equivalent performance to the M2 at half the power [21]. For an iPhone-class workload such as a continuous Visual Look Up scan or live captioning a meeting, ANE execution typically costs single-digit milliwatts per inference, low enough that battery drain is negligible compared with the screen and radios [4].
The original justification for the ANE was Face ID, a per-unlock face-recognition pipeline that needed to clear the secure enclave, run on-device, and beat the Touch ID latency budget. Once that hardware existed, Apple gradually expanded the workloads it carried.
| Feature | First shipped | Notes |
|---|---|---|
| Face ID | 2017 (iPhone X) | Original motivating workload; runs in the Secure Enclave perimeter [1][9] |
| Animoji and Memoji | 2017 / 2018 | Real-time facial mesh tracking driving stylised avatars [1] |
| Photo Memories and on-device search | 2018 onward | Object, scene, and face clustering across the user's library [3] |
| Smart HDR and Deep Fusion | 2019 (iPhone 11) | Per-pixel ML fusion of bracketed exposures [11] |
| Photonic Engine | 2022 (iPhone 14) | ML pipeline applied earlier in the image processing stack [17] |
| Live Text | 2021 (iOS 15) | OCR across photos, screenshots, and the camera viewfinder [3] |
| Visual Look Up | 2021 (iOS 15) | Identifies plants, pets, landmarks, and art on-device |
| Translate (offline) | 2020 (iOS 14) | Downloadable language packs run on the ANE |
| Voice Isolation in FaceTime | 2021 (iOS 15) | Real-time speech enhancement |
| Live Captions | 2022 (iOS 16) | On-device speech-to-text overlay for any audio |
| Personal Voice | 2023 (iOS 17) | User-trainable synthetic voice for accessibility |
| Apple Intelligence | 2024 (iOS 18.1) | Writing tools, notification summaries, Siri rewrite, Genmoji, Image Playground [6][7] |
Most of these features run silently. Users do not pick the ANE; the operating system does. The system-wide effect is that on-device ML has become the default. Cloud calls are reserved for cases where a model is too large to fit in unified memory or where the user explicitly invokes ChatGPT through Siri.
Apple announced Apple Intelligence at WWDC on June 10, 2024 and shipped its first features in iOS 18.1, iPadOS 18.1, and macOS Sequoia 15.1 in October 2024 [6][7]. The system has two layers:
Apple Intelligence requires either an A17 Pro or A18 / A18 Pro iPhone, or any Mac, iPad, or Vision Pro with an M-series chip and at least 8 GB of unified memory, plus iOS 18.1 / iPadOS 18.1 / macOS Sequoia 15.1 / visionOS 2.4 or later [25]. The 8 GB floor is the binding constraint: the on-device foundation model and its adapter caches need to be paged in alongside the rest of the operating system, which is why the otherwise capable iPhone 15 (A16, 6 GB) is excluded.
The on-device model is not a general chatbot. Apple ships it with adapters for specific tasks: text rewriting and proofreading, summarisation, prioritisation, response suggestions, notification summaries, and Genmoji generation [6]. Each adapter is on the order of tens of megabytes and is loaded into the ANE alongside the base model when the relevant feature runs. The full technical detail is laid out in Apple's "Apple Intelligence Foundation Language Models" report and its 2025 update, both published on the Apple Machine Learning Research site [6][26].
For reasoning, image generation, and other workloads beyond the on-device model's scope, the request is encrypted, attested, and sent to PCC. Apple publishes the operating system images for those servers and runs them on hardened Apple silicon, so that researchers can verify what is and is not running. The on-device ANE remains the default execution target; PCC only runs when needed and never persists user data after the request is served [27].
The ANE was one of the first NPUs to ship at scale in a mass-market device. By 2024 nearly every silicon vendor offered a comparable accelerator. The numbers below report each vendor's headline INT8 figure for the most current laptop or smartphone-class chip available in 2024 and early 2025; data-centre accelerators are excluded.
| Vendor | Latest NPU | TOPS (INT8) | Key devices | Year | Software stack |
|---|---|---|---|---|---|
| Apple | Apple Neural Engine (M4 / A18) | 38 / 35 | iPad Pro, MacBook Pro, iPhone 16, iPhone 15 Pro | 2024 | Core ML, MLX [21][22] |
| Qualcomm | Hexagon NPU (Snapdragon X Elite) | 45 | Copilot+ PCs (Surface, Galaxy Book4 Edge) | 2024 | QNN, AI Hub, ONNX Runtime [28] |
| Qualcomm | Hexagon NPU (Snapdragon 8 Gen 3) | ~45 | Samsung Galaxy S24, Xiaomi 14 | 2023 | QNN, AI Hub [28] |
| Edge TPU / Tensor TPU (Tensor G4) | ~40 | Pixel 9, Pixel 9 Pro | 2024 | LiteRT (formerly TensorFlow Lite), AICore [29] | |
| Samsung | NPU (Exynos 2400) | ~17 | Galaxy S24 (non-US) | 2024 | One UI AI APIs, ONNX Runtime |
| MediaTek | APU (Dimensity 9400) | ~50 | Vivo X200, Oppo Find X8 | 2024 | NeuroPilot SDK |
| Intel | NPU 4 (Lunar Lake Core Ultra 200V) | 48 | Copilot+ Intel laptops | 2024 | OpenVINO, DirectML, ONNX Runtime [30] |
| AMD | XDNA 2 (Ryzen AI 300, Ryzen AI Max+ 395) | 50 | ASUS ProArt, Framework Desktop | 2024 | Ryzen AI Software, ONNX Runtime [30] |
| NVIDIA | Tegra T239 / Jetson Orin | varies (40 to 275) | Switch 2, robotics dev kits | 2023 | TensorRT, CUDA |
The headline TOPS numbers are easy to misread. Vendors report at different precisions and pick the most flattering metric. What sets the ANE apart in practice is less raw throughput and more vertical integration: the same Core ML model runs on every shipping Apple device, the runtime handles fallback automatically, and feature work like Apple Intelligence is wired through the same stack that third-party app developers use [3][5]. Qualcomm's Hexagon and Intel's Lunar Lake NPU are competitive on TOPS but require developers to deal with separate SDKs and a less consistent operator coverage story across SoC generations.
The ANE is a closed, single-vendor accelerator. Some practical consequences:
None of these are fatal for normal app development, but they explain why projects like the Whisper and Stable Diffusion ports for iOS often spend most of their effort on layout-friendly model rewrites rather than on inference logic.
The ANE was one of the first NPUs to ship in a mass-market consumer device, predating most competitors by two to four years and arriving on a billion-plus iPhones over its first half-decade [3]. It made on-device AI a default rather than a research demo. Face ID, computational photography, Live Text, and Apple Intelligence all rely on it. It also shifted the privacy conversation around mobile AI: because the inference happens on the user's device, the underlying images, voice samples, or text never need to leave it, and Apple has built a substantial part of its marketing around that fact [27].
For Apple itself, the ANE underpins the company's ability to compete in the generative AI cycle without relying on third-party data centres for every interaction. The on-device foundation model in Apple Intelligence runs on the same silicon that handles Face ID and photo Memories, which gives Apple a useful structural advantage: a captive distribution channel of hundreds of millions of devices already shipped with hardware capable of running its models. Whether that translates into a long-term lead over Google's Tensor and Qualcomm's Snapdragon NPUs is still being decided, but as of 2026 the ANE is the most widely deployed NPU in the world, and the most directly responsible for the popularisation of on-device AI [3][6].