Core ML
Last reviewed
Sources
29 citations
Review status
Source-backed
Revision
v2 ยท 5,021 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
29 citations
Review status
Source-backed
Revision
v2 ยท 5,021 words
Add missing citations, update stale details, or suggest a clearer explanation.
Core ML is Apple's framework for running trained machine learning models on-device across iOS, iPadOS, macOS, watchOS, tvOS, and visionOS. Introduced at WWDC 2017, it lets developers add a model to an app and execute inference with hardware acceleration that is dispatched automatically across the CPU, the GPU (through Metal), and the Apple Neural Engine (ANE), so that no part of the work has to leave the device. Models ship in the .mlmodel or .mlpackage format, are converted from PyTorch, TensorFlow, and other frameworks with the open-source coremltools toolkit, and can be trained on a Mac with Create ML. Core ML is the runtime layer beneath most of Apple's machine learning features, from Face ID and Live Text to Apple Intelligence [1][2][3].
Core ML shipped with iOS 11 in September 2017 alongside the first Apple Neural Engine on the A11 Bionic chip in the iPhone 8 and iPhone X [1][20][21]. It was the first machine-learning runtime to ship as a first-class system framework on a major mobile operating system, predating Google's TensorFlow Lite (announced November 2017) and arguably defining the category of on-device inference engines. As of iOS 18 and macOS 15 it underpins Apple Intelligence and a Foundation Models framework that exposes Apple's roughly 3 billion parameter on-device language model to third-party developers [14][15]. At WWDC 2026 Apple introduced a companion framework, Core AI, for generative models, while keeping Core ML as the runtime for classical and smaller neural networks [26][27].
Core ML is a single inference framework that exposes a unified API across many model families. A developer drops a compiled model into Xcode, Xcode auto-generates a strongly typed Swift or Objective-C class for the model's inputs and outputs, and the runtime decides at load time which combination of CPU, GPU, and Neural Engine should execute each operation [1][3]. The same .mlpackage file runs unmodified on an iPhone, an iPad, an Apple Watch, an Apple TV, an Apple silicon Mac, or an Apple Vision Pro, with the runtime selecting the appropriate compute backend.
Apple positions privacy as the central design goal. In its developer documentation Apple states that Core ML "optimizes on-device performance by leveraging the CPU, GPU, and Neural Engine while minimizing its memory footprint and power consumption," adding that "running a model strictly on the user's device removes any need for a network connection, which helps keep the user's data private and your app responsive" [2].
Core ML is closed-source Apple software, but it sits on top of and beside other Apple frameworks that are partly open: Metal Performance Shaders (MPS), MPSGraph, the Basic Neural Network Subroutines (BNNS) in Accelerate, and the coremltools Python package, which is BSD-3-Clause licensed [10][11].
| Property | Value |
|---|---|
| Developer | Apple Inc. |
| Initial release | June 2017 (WWDC), iOS 11 in September 2017 |
| Latest model version | Core ML model version 9 (introduced with iOS 18) |
| File formats | .mlmodel, .mlpackage, compiled .mlmodelc bundle |
| Conversion toolkit | coremltools (BSD-3-Clause, Python) |
| Languages | Swift, Objective-C, with Python tooling |
| Platforms | iOS, iPadOS, macOS, watchOS, tvOS, visionOS |
| Hardware backends | Apple Neural Engine, GPU (Metal), CPU (BNNS / Accelerate) |
| Source | Closed source runtime; coremltools is open source |
| Cost | Free, built into the operating system |
Imagine you trained a robot brain on a big computer to tell cats from dogs. Core ML is the part of an iPhone that takes that brain, shrinks it so it fits in the phone, and runs it quickly using the phone's special AI chip (the Neural Engine) instead of asking a server on the internet. Because the thinking happens right on the phone, it is fast, it works without a connection, and your photos never have to be sent anywhere. App makers do not have to know how the chip works: they hand Core ML the brain, and Core ML figures out the fastest way to run it.
Core ML has shipped a new major version with almost every iOS release. Apple does not always brand each version with a marketing number, but the developer community refers to them by their introduction year and feature set. The model file format itself is independently versioned, and a model file declares the minimum runtime version it requires.
| Year | Event | What shipped |
|---|---|---|
| 2017 (WWDC, June) | Core ML 1 introduced | The .mlmodel format, support for neural networks, tree ensembles, support vector machines, generalised linear models, and pipelines. iOS 11 launched in September with the iPhone 8 and iPhone X, which included the first Apple Neural Engine. |
| 2018 (WWDC) | Core ML 2 + Create ML | Smaller models through 16-bit and 8-bit quantisation, batch prediction, and faster execution. Create ML, a new Swift-and-Xcode-based tool for training image, text, and tabular models on a Mac, was introduced as a complement. |
| 2019 (WWDC) | Core ML 3 | On-device personalisation (fine-tuning the last layers of a neural network or k-Nearest Neighbours classifier on the device using user data), more than 100 new layer types including transformer-friendly ops, and support for linked models inside pipelines. |
| 2020 (WWDC) | Core ML 4 + ML Compute | Model encryption built into Xcode and CloudKit-hosted Model Deployment so apps can update models without an App Store release. coremltools 4 shipped a Unified Conversion API for TensorFlow 1.x, TensorFlow 2.x with tf.keras, and PyTorch, all routed through a new Model Intermediate Language (MIL). The separate ML Compute framework abstracted CPU and GPU training on macOS. |
| 2021 (WWDC) | Core ML 5 + ML Program | The .mlpackage container format and the new ML Program model type, which decouples weights from program graph and supports Float16 as a first-class precision. Xcode Cloud allowed continuous deployment of model packages. |
| 2022 (WWDC) | Core ML 6 | Faster inference on M-series Macs, ANE-friendly transformer layouts, the public release of Apple's transformer optimisation guide, weight palettization at conversion time, and the new Activity Classifier in Create ML. |
| 2023 (WWDC) | Core ML 7 | First-class visionOS support for the Apple Vision Pro, advanced model compression APIs in coremltools 7 (8-bit activation quantisation, magnitude pruning, k-means palettization to as few as 2 bits per weight) and joint compression schemes. |
| 2024 (WWDC) | Core ML 8 + Apple Intelligence | New Stateful API and stateful buffers for KV-cache support in transformer models, the Swift MLTensor type for ergonomic tensor work, multifunction models for packaging LoRA adapters, and 4-bit block-wise linear quantisation. Apple Intelligence and the on-device Foundation Models framework were introduced on top. |
| 2025 (WWDC) | Core ML 9 + Foundation Models framework | Continued improvements in foundation model integration, more aggressive low-bit quantisation, and the Foundation Models framework released to third-party developers with guided generation (the @Generable macro) for working directly with Swift data types [25]. |
| 2026 (WWDC) | Core AI introduced alongside Core ML | Apple announced Core AI, an Apple-silicon-only framework for on-device large language models and generative AI, positioned as the successor for transformer and generative workloads while Core ML continues to serve classical and smaller neural models [26][27]. |
Apple has avoided marketing the framework by version number since 2020, instead announcing features under generic Machine Learning umbrellas at WWDC. The version number that matters in practice is the model specification version baked into a .mlpackage, which determines the minimum operating system that can load it.
Core ML uses two on-disk formats and a compiled runtime form.
The original .mlmodel is a single Protocol Buffers file that bundles the model graph, the weights, and the metadata [1]. It supports the original neural network type (with a fixed list of layer kinds), tree ensembles, support vector machines, linear and logistic regressors, generalised linear models, scalers, imputers, one-hot encoders, word embeddings, and pipelines that chain several models together. A .mlmodel declares a specification version (1 through about 6 for legacy models) so the runtime can refuse to load a file that uses operations it does not understand.
The newer .mlpackage is a directory rather than a single file. It separates the model architecture from the weights and the metadata, which makes it easier to edit, diff, and store in version control, and it lets Apple add new model types without invalidating older tooling [11]. The .mlpackage container is required to use the ML Program model type, the Stateful API introduced in 2024, and Float16 as a default precision. Because an ML Program decouples the weights from the program architecture, it cannot be saved as a single .mlmodel file; it is always saved with the .mlpackage extension [11].
When Xcode builds an app, it compiles every model into a .mlmodelc bundle. This is an opaque, layout-optimised version of the model for the target operating system. On a user's device, only the compiled bundle exists; the original .mlmodel or .mlpackage is not shipped. Apple's model encryption feature (introduced in Core ML 4) operates on the compiled bundle, with decryption keys retrieved from Apple's servers on first use and held only in memory [9].
Core ML is structured as a stack. The high-level Swift and Objective-C API exposes one class per model. Below that, Core ML manages a graph of operations and dispatches each to one of three compute backends [1][3].
Developers can hint at which backend they prefer through the MLComputeUnits setting on MLModelConfiguration. Common values are cpuOnly (forces CPU execution, useful for debugging and reproducibility), cpuAndGPU (avoids the Neural Engine), cpuAndNeuralEngine (avoids the GPU), and all (the default, which lets Core ML pick). Core ML still routes individual ops; setting cpuAndNeuralEngine does not guarantee that every layer runs on the ANE because some ops are not ANE-supported.
Apple ships several vertical frameworks that wrap Core ML for specific data types:
| Framework | Purpose | Year |
|---|---|---|
| Vision | Image and video analysis: face detection, text recognition, image classification, object tracking | 2017 |
| Natural Language | Tokenisation, language identification, named entity recognition, sentiment analysis, word embeddings | 2017 (Foundation), 2018 (NL framework) |
| Speech | Speech-to-text recognition with on-device or server modes | 2016 (pre-Core ML), with Core ML integration from 2019 |
| SoundAnalysis | Sound classification (laughter, applause, custom sounds) | 2019 |
| Translation | On-device language translation | 2020 |
| ML Compute | Hardware-accelerated training graph for CPU and GPU | 2020 |
| Foundation Models | Access to Apple's on-device 3B language model with adapters | 2024 |
Most apps use Core ML through one of these higher-level frameworks rather than calling the model directly.
Core ML supports a range of classical and deep-learning model types, with the catalogue growing each year.
| Family | Examples | Supported since |
|---|---|---|
| Convolutional neural networks | ResNet, MobileNet, EfficientNet, image classifiers, object detectors | Core ML 1 (2017) |
| Recurrent neural networks | LSTM, GRU for sequence tasks | Core ML 1 (2017) |
| Transformer networks | BERT, ViT, encoder-decoder models, Mistral 7B | Improved each release; first-class with stateful KV-cache in 2024 |
| Tree ensembles | Random Forest, gradient-boosted trees, XGBoost via conversion | Core ML 1 (2017) |
| Linear and logistic regression | Generalised linear models | Core ML 1 (2017) |
| Support vector machines | Linear and kernel SVMs | Core ML 1 (2017) |
| Pipelines | Chains of feature extractors, scalers, and predictors | Core ML 1 (2017) |
| Word embeddings | Word2Vec-style and custom embeddings | Core ML 2 (2018) |
| Sound and activity classifiers | Audio frame classifiers, accelerometer activity classifiers | Core ML 3 (2019) |
| k-Nearest Neighbours classifiers | Updatable on-device classifiers for personalisation | Core ML 3 (2019) |
| Custom layers and ops | Developer-defined layers in Swift / Metal / Python | Core ML 1 (custom layers); custom Torch ops in coremltools |
The coremltools package is the open-source Python toolchain that converts trained models from popular frameworks into Core ML format. It is BSD-3-Clause licensed and developed in the open on GitHub [10].
| Source framework | Path | Notes |
|---|---|---|
| PyTorch | TorchScript or torch.export to MIL to ML Program | The most modern path; supports custom ops via @register_torch_op. |
| TensorFlow 2.x | SavedModel or tf.keras to MIL | Replaces the legacy tfcoreml tool. |
| TensorFlow 1.x | Frozen GraphDef to MIL | Still supported; mostly used for legacy models. |
| ONNX | Deprecated; was supported via onnx-coreml | The recommended path is now ONNX to PyTorch to Core ML, or to use ONNX Runtime instead. |
| scikit-learn | coremltools.converters.sklearn | Decision trees, linear models, SVMs, pipelines. |
| XGBoost | coremltools.converters.xgboost | Gradient-boosted tree models. |
| LightGBM | Convert to ONNX or scikit-learn first | No native converter. |
| LibSVM | coremltools.converters.libsvm | Classic SVM models. |
| Hugging Face | exporters package | Apple maintains a Hugging Face exporters extension that handles Transformers models. |
The central conversion call is coremltools.convert(model, ...). From coremltools 4 onward this is the Unified Conversion API: it auto-detects the source framework, lowers the graph to the Model Intermediate Language (MIL) for optimisation passes (constant folding, dead-code elimination, layer fusion), and lowers MIL down to either the legacy NeuralNetwork model type or the modern ML Program type. From coremltools 7 the default target is ML Program, produced by passing convert_to="mlprogram" or simply relying on the default [11].
A minimal PyTorch conversion looks like this:
import coremltools as ct
import torch
traced = torch.jit.trace(model, example_input)
mlmodel = ct.convert(
traced,
inputs=[ct.TensorType(shape=example_input.shape)],
convert_to="mlprogram",
)
mlmodel.save("MyModel.mlpackage")
Coremltools also includes utilities for inspecting models, comparing predictions against the source framework, validating input and output shapes, and applying post-training optimisations.
Fitting a competitive model into a phone's RAM and within the Neural Engine's compute budget is the central engineering problem on Apple platforms. Core ML and coremltools support a stack of quantization and compression techniques [12].
| Technique | What it does | Bit widths |
|---|---|---|
| Float16 | Half-precision weights and activations | 16-bit; default for ML Programs since coremltools 5 |
| Linear quantisation | Affine mapping of weights to a smaller integer range | INT8 weights and activations; INT4 weights from coremltools 7 |
| Block-wise linear quantisation | Different scale per block of weights, configurable block size | INT4 (and INT8) per-block, introduced 2024 |
| Palettization | k-means clustering of weights into a lookup table indexed per weight | 1, 2, 3, 4, 6, or 8 bits per index [13] |
| Magnitude pruning | Sets near-zero weights to exactly zero, then stores them sparsely | Variable sparsity ratio, Core ML 7+ |
| Joint compression | Combines pruning + quantisation or pruning + palettization | coremltools 8+ |
A practical example from Apple's own work: the on-device foundation model for Apple Intelligence uses mixed 2-bit and 4-bit palettization that averages 3.7 bits per weight while preserving accuracy comparable to the uncompressed model [15]. The Mistral 7B example Apple shipped at WWDC 2024 uses 4-bit block-wise linear quantisation, which shrinks the model from about 14 GB at Float16 to about 3.8 GB on disk and keeps total memory under 4 GB at runtime [16].
In a separate 2024 case study, Apple ran Llama-3.1-8B-Instruct entirely on-device with Core ML at roughly 33 decoding tokens per second on a Mac with an M1 Max, using Int4 weight quantisation together with the new stateful key-value cache to avoid repeatedly copying the KV cache, which can reach about 1 GB at an 8,192-token context [28].
Core ML targets three backends and dispatches operations to whichever can run them fastest within the user-selected compute units.
| Backend | Strengths | Trade-offs |
|---|---|---|
| Apple Neural Engine (ANE) | Very low power per operation, very high throughput on supported ops, leaves the GPU and CPU free for the rest of the app | Only a subset of operations are ANE-friendly; weights have layout constraints; some shapes force a fall-back |
| GPU (Metal Performance Shaders, MPSGraph) | Flexible; supports almost any tensor op; large memory bandwidth; good for big batch sizes and unusual shapes | Higher power than ANE; competes with rendering work for GPU time |
| CPU (BNNS / Accelerate) | Universally available; deterministic; required for some ops not implemented elsewhere; benefits from the AMX matrix coprocessor on Apple silicon | Lowest throughput per watt for large neural networks |
The Apple Neural Engine started in 2017 as a 2-core unit on the A11 Bionic delivering 0.6 TOPS, sufficient to power Face ID and Animoji [20][21]. It has grown roughly an order of magnitude every few generations: 5 TOPS on the A12 (2018), 8.5 TOPS on the A14 / M1 (2020), 15.8 TOPS on the A15 / M2, 17 TOPS on the A16, 35 TOPS on the A17 Pro (2023), 38 TOPS on the M4 (2024) [20]. Every chip Apple has shipped since 2017 includes a Neural Engine, which means there is a guaranteed installed base of well over a billion ANE-capable devices.
Core ML has shipped on every iPhone, iPad, Apple Watch, and Mac running a current Apple operating system since 2017. It is built into the OS and free for developers to use, which has driven broad uptake. Major Apple-internal users include:
Third-party adoption is broad. Apps such as Pinterest, Etsy, eBay, and Snapchat use Vision-backed Core ML for image search and AR effects. Healthcare apps use Core ML for skin lesion screening, retinal image classification, and pose estimation. Photography apps such as Halide and Photomator embed custom Core ML models for noise reduction and subject masking. Game developers use Core ML through MPSGraph for upscaling and gesture recognition.
At WWDC 2024 Apple introduced Apple Intelligence, the user-facing brand for the company's generative-AI features. Apple Intelligence is built on top of Core ML in two ways. First, the on-device roughly 3 billion parameter foundation language model runs as a Core ML model on the Apple Neural Engine, using the new Stateful API for its KV-cache. Apple reported about 0.6 milliseconds per prompt token of latency and roughly 30 tokens per second of generation on an iPhone 15 Pro, before token-speculation optimisations [15]. Second, Apple released a separate Foundation Models framework that lets third-party developers call the same on-device 3B model through a high-level Swift API, optionally with task-specific adapters [14].
Apple's adapter strategy keeps the base model fixed and swaps in small LoRA-style modules (typically tens of megabytes for a rank-16 adapter) for each task: summarisation, proofreading, mail reply suggestions, and so on. The base model itself uses mixed 2-bit and 4-bit palettization for an average of 3.7 bits per weight, with small adapters held in 16-bit precision [15].
In June 2026 Apple introduced its third generation of foundation models. The on-device tier now spans an AFM 3 Core model at about 3 billion parameters and a larger AFM 3 Core Advanced model at about 20 billion parameters that uses sparse activation (only 1 to 4 billion parameters active at a time), with multilingual support across 28 languages [29]. Apple emphasises the privacy stance of its training: "We do not use our users' private personal data or user interactions when training our foundation models" [29].
The Foundation Models framework is technically a separate framework from Core ML, but it is tightly coupled in practice: it loads the model through the Core ML runtime, runs it on the Neural Engine, and benefits from every optimisation Apple has shipped in coremltools.
In March 2026, Bloomberg's Mark Gurman reported that Apple would introduce a modernised framework called Core AI at WWDC, reflecting a shift in terminology from "machine learning" to "AI" [26]. At WWDC 2026 Apple confirmed Core AI as a dedicated framework for on-device large language models and generative AI. According to coverage of the announcement, Core AI runs only on Apple silicon, supports models from compact 3 billion parameter vision models up to roughly 70 billion parameter reasoning models, and promises "zero server dependencies" and "zero per-token cloud costs" [27].
Core AI does not retire Core ML. The two coexist: Apple's guidance is to use Core ML for classical and smaller neural models (classification, object detection, decision trees) and Core AI for transformers and generative workloads, with MLX as a third option for researchers working with custom weights [27]. Existing .mlmodel and .mlpackage files continue to work, and Apple's long deprecation cycles mean Core ML is not going away in a single release [26].
Core ML is one of several on-device inference engines. The main competitors target Android, embedded Linux, or cross-platform deployments.
| Framework | Vendor | Platforms | Accelerator support | Model formats | Year | License |
|---|---|---|---|---|---|---|
| Core ML | Apple | iOS, iPadOS, macOS, watchOS, tvOS, visionOS | Apple Neural Engine, Metal GPU, CPU/AMX | .mlmodel, .mlpackage | 2017 | Closed (runtime); BSD-3-Clause (coremltools) |
| TensorFlow Lite (LiteRT) | Android, iOS, Linux, microcontrollers | NNAPI, GPU delegate, Hexagon, Edge TPU, Core ML delegate on iOS | .tflite (FlatBuffers) | 2017 (Nov) | Apache 2.0 | |
| ONNX Runtime Mobile | Microsoft and contributors | Android, iOS, Linux, Windows | NNAPI, Core ML, XNNPACK, QNN | .ort, .onnx | 2018 | MIT |
| PyTorch Mobile / ExecuTorch | Meta | Android, iOS, embedded | XNNPACK, Vulkan, Core ML, MPS, QNN | .pte (ExecuTorch), .ptl (legacy mobile) | 2019 (PyTorch Mobile), 2023 (ExecuTorch) | BSD-style |
| MediaPipe | Android, iOS, web, Linux | TFLite delegates, GPU, NNAPI | TFLite under the hood | 2019 | Apache 2.0 | |
| NCNN | Tencent | Android, iOS, Linux, Windows, embedded | Vulkan, OpenCL | .param + .bin | 2017 | BSD-3-Clause |
| MNN | Alibaba | Android, iOS, Linux, Windows, embedded | OpenCL, Vulkan, Metal, CUDA | .mnn | 2019 | Apache 2.0 |
The most common direct comparison is with TensorFlow Lite, which Google rebranded as LiteRT in late 2024. Core ML has the deeper Apple-platform integration, automatic Neural Engine routing, and a richer set of high-level vertical frameworks. TensorFlow Lite covers Android plus a much wider range of embedded targets (microcontrollers, single-board computers) and is the obvious choice for cross-platform mobile apps. Most large iOS/Android codebases end up shipping the same model in both formats, with conversion through ONNX or PyTorch as the lingua franca.
Core ML is locked to Apple platforms, which makes cross-platform deployment a duplication of effort. The runtime itself is closed source, which makes debugging op-routing decisions hard; the Neural Engine in particular is a black box, and developers often resort to reading Console logs and using Xcode's Performance Reports to figure out why a model fell back to GPU or CPU [19].
Conversion from PyTorch can require workarounds for ops that have no MIL equivalent, custom Torch op registrations, or graph rewrites. Some research-grade architectures (sparse mixture of experts, very large vocabulary models, models with dynamic control flow) take real effort to convert.
On-device training is limited. Core ML offers personalisation (fine-tuning a small number of layers or a k-NN head), not full training; Create ML covers training on a Mac but is opinionated about model architecture; ML Compute provides a lower-level training graph but is rarely used outside Apple.
The Neural Engine has shape and op constraints that change between chip generations, so a model that runs on the ANE on an A17 Pro might run on the GPU on an A14. Apple's documentation often lags the rapid pace of model architecture changes, and the community fills the gap through blogs (Pete Warden, Matthijs Hollemans / machinethink.net, Hugging Face), forums, and reverse-engineering projects.
Core ML itself is closed-source Apple software. The conversion side is open. The coremltools repository on GitHub is BSD-3-Clause licensed, accepts external pull requests, and ships frequent releases (8.x series in 2025, with 9.x in beta) [10]. Apple also maintains an apple/ml-stable-diffusion repository that demonstrates ANE-friendly Stable Diffusion in Core ML, and an apple/coreml-stable-diffusion-xl model on Hugging Face. The Hugging Face team maintains an exporters tool that converts Transformers models directly to Core ML, plus the swift-transformers package for running them in Swift apps.
Third-party reverse-engineering projects (notably Matthijs Hollemans' hollance/neural-engine on GitHub) document which ops the ANE supports on each chip, which has become a de facto reference for engineers trying to keep models on the Neural Engine [19].
Core ML defined on-device machine learning as a first-class platform capability rather than a research curiosity. It shipped before Google's TensorFlow Lite (announced November 2017), before PyTorch Mobile (2019), and well before ExecuTorch (2023). The combination of a dedicated neural accelerator (the Apple Neural Engine) and an OS-level inference framework, both released the same week, was a step change for the mobile industry.
Its long-term effect is twofold. First, it pushed almost every other smartphone vendor to ship a neural processing unit and a corresponding inference framework: Google's NNAPI and now AICore, Qualcomm's Hexagon NPU and SNPE, Samsung's Neural Processing Unit. Second, it created the runtime layer that Apple now uses to ship Apple Intelligence to every supported iPhone, iPad, and Mac, without sending user data to a server. The on-device 3B foundation model that powers Apple Intelligence runs through the same Core ML runtime that has been on every iPhone since 2017; the difference is mostly that the model and the silicon have grown by several orders of magnitude.
For a developer, Core ML is the path of least resistance for shipping a machine-learning model on an Apple device. For Apple, it is the substrate for almost every machine-learning feature the company ships, from Face ID and Live Text to Apple Intelligence and Personal Voice.