Core ML
Last reviewed
May 1, 2026
Sources
25 citations
Review status
Source-backed
Revision
v1 ยท 4,161 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 1, 2026
Sources
25 citations
Review status
Source-backed
Revision
v1 ยท 4,161 words
Add missing citations, update stale details, or suggest a clearer explanation.
Core ML is Apple's foundation framework for on-device machine learning across iOS, iPadOS, macOS, watchOS, tvOS, and visionOS. It lets developers integrate trained machine-learning models into their apps and run inference with hardware acceleration on the Apple Neural Engine (ANE), the GPU through Metal, and the CPU through the Accelerate / BNNS libraries. Core ML provides a unified runtime, two related on-disk model formats (.mlmodel and .mlpackage), an open-source conversion toolkit (coremltools), and high-level vertical APIs such as Vision, Natural Language, Speech, and Sound.
Core ML was introduced at WWDC 2017 and shipped with iOS 11 in September 2017, alongside the Apple Neural Engine on the A11 Bionic chip in the iPhone 8 and iPhone X. It was the first machine-learning runtime to ship as a first-class system framework on a major mobile operating system, predating Google's TensorFlow Lite (announced November 2017) and arguably defining the category of on-device inference engines. As of iOS 18 and macOS 15 it underpins Apple Intelligence and a Foundation Models framework that exposes Apple's ~3 billion parameter on-device language model to third-party developers.
Core ML is a single inference framework that exposes a unified API across many model families. A developer drops a compiled model into Xcode, Xcode auto-generates a strongly typed Swift or Objective-C class for the model's inputs and outputs, and the runtime decides at load time which combination of CPU, GPU, and Neural Engine should execute each operation. The same .mlpackage file runs unmodified on an iPhone, an iPad, an Apple Watch, an Apple TV, an Apple Silicon Mac, or an Apple Vision Pro, with the runtime selecting the appropriate compute backend.
Core ML is closed-source Apple software, but it sits on top of and beside other Apple frameworks that are partly open: Metal Performance Shaders (MPS), MPSGraph, the Basic Neural Network Subroutines (BNNS) in Accelerate, and the coremltools Python package, which is BSD-3-Clause licensed.
| Property | Value |
|---|---|
| Developer | Apple Inc. |
| Initial release | June 2017 (WWDC), iOS 11 in September 2017 |
| Latest model version | Core ML model version 9 (introduced with iOS 18) |
| File formats | .mlmodel, .mlpackage, compiled .mlmodelc bundle |
| Conversion toolkit | coremltools (BSD-3-Clause, Python) |
| Languages | Swift, Objective-C, with Python tooling |
| Platforms | iOS, iPadOS, macOS, watchOS, tvOS, visionOS |
| Hardware backends | Apple Neural Engine, GPU (Metal), CPU (BNNS / Accelerate) |
| Source | Closed source runtime; coremltools is open source |
| Cost | Free, built into the operating system |
Core ML has shipped a new major version with almost every iOS release. Apple does not always brand each version with a marketing number, but the developer community refers to them by their introduction year and feature set. The model file format itself is independently versioned, and a model file declares the minimum runtime version it requires.
| Year | Event | What shipped |
|---|---|---|
| 2017 (WWDC, June) | Core ML 1 introduced | The .mlmodel format, support for neural networks, tree ensembles, support vector machines, generalised linear models, and pipelines. iOS 11 launched in September with the iPhone 8 and iPhone X, which included the first Apple Neural Engine. |
| 2018 (WWDC) | Core ML 2 + Create ML | Smaller models through 16-bit and 8-bit quantisation, batch prediction, and faster execution. Create ML, a new Swift-and-Xcode-based tool for training image, text, and tabular models on a Mac, was introduced as a complement. |
| 2019 (WWDC) | Core ML 3 | On-device personalisation (fine-tuning the last layers of a neural network or k-Nearest Neighbours classifier on the device using user data), more than 100 new layer types including transformer-friendly ops, and support for linked models inside pipelines. |
| 2020 (WWDC) | Core ML 4 + ML Compute | Model encryption built into Xcode and CloudKit-hosted Model Deployment so apps can update models without an App Store release. coremltools 4 shipped a Unified Conversion API for TensorFlow 1.x, TensorFlow 2.x with tf.keras, and PyTorch, all routed through a new Model Intermediate Language (MIL). The separate ML Compute framework abstracted CPU and GPU training on macOS. |
| 2021 (WWDC) | Core ML 5 + ML Program | The .mlpackage container format and the new ML Program model type, which decouples weights from program graph and supports Float16 as a first-class precision. Xcode Cloud allowed continuous deployment of model packages. |
| 2022 (WWDC) | Core ML 6 | Faster inference on M-series Macs, ANE-friendly transformer layouts, the public release of Apple's transformer optimisation guide, weight palettization at conversion time, and the new Activity Classifier in Create ML. |
| 2023 (WWDC) | Core ML 7 | First-class visionOS support for the Apple Vision Pro, advanced model compression APIs in coremltools 7 (8-bit activation quantisation, magnitude pruning, k-means palettization to as few as 2 bits per weight) and joint compression schemes. |
| 2024 (WWDC) | Core ML 8 + Apple Intelligence | New Stateful API and stateful buffers for KV-cache support in transformer models, the Swift MLTensor type for ergonomic tensor work, multifunction models for packaging LoRA adapters, and 4-bit block-wise linear quantisation. Apple Intelligence and the on-device Foundation Models framework were introduced on top. |
| 2025 (WWDC) | Core ML 9 + continued evolution | Continued improvements in foundation model integration, more aggressive low-bit quantisation, and tighter integration between Core ML, MPSGraph, and the Foundation Models framework released to third-party developers. |
Apple has avoided marketing the framework by version number since 2020, instead announcing features under generic Machine Learning umbrellas at WWDC. The version number that matters in practice is the model specification version baked into a .mlpackage, which determines the minimum operating system that can load it.
Core ML uses two on-disk formats and a compiled runtime form.
The original .mlmodel is a single Protocol Buffers file that bundles the model graph, the weights, and the metadata. It supports the original neural network type (with a fixed list of layer kinds), tree ensembles, support vector machines, linear and logistic regressors, generalised linear models, scalers, imputers, one-hot encoders, word embeddings, and pipelines that chain several models together. A .mlmodel declares a specification version (1 through about 6 for legacy models) so the runtime can refuse to load a file that uses operations it does not understand.
The newer .mlpackage is a directory rather than a single file. It separates the model architecture from the weights and the metadata, which makes it easier to edit, diff, and store in version control, and it lets Apple add new model types without invalidating older tooling. The .mlpackage container is required to use the ML Program model type, the Stateful API introduced in 2024, and Float16 as a default precision.
When Xcode builds an app, it compiles every model into a .mlmodelc bundle. This is an opaque, layout-optimised version of the model for the target operating system. On a user's device, only the compiled bundle exists; the original .mlmodel or .mlpackage is not shipped. Apple's model encryption feature (introduced in Core ML 4) operates on the compiled bundle, with decryption keys retrieved from Apple's servers on first use and held only in memory.
Core ML is structured as a stack. The high-level Swift and Objective-C API exposes one class per model. Below that, Core ML manages a graph of operations and dispatches each to one of three compute backends.
Developers can hint at which backend they prefer through the MLComputeUnits setting on MLModelConfiguration. Common values are cpuOnly (forces CPU execution, useful for debugging and reproducibility), cpuAndGPU (avoids the Neural Engine), cpuAndNeuralEngine (avoids the GPU), and all (the default, which lets Core ML pick). Core ML still routes individual ops; setting cpuAndNeuralEngine does not guarantee that every layer runs on the ANE because some ops are not ANE-supported.
Apple ships several vertical frameworks that wrap Core ML for specific data types:
| Framework | Purpose | Year |
|---|---|---|
| Vision | Image and video analysis: face detection, text recognition, image classification, object tracking | 2017 |
| Natural Language | Tokenisation, language identification, named entity recognition, sentiment analysis, word embeddings | 2017 (Foundation), 2018 (NL framework) |
| Speech | Speech-to-text recognition with on-device or server modes | 2016 (pre-Core ML), with Core ML integration from 2019 |
| SoundAnalysis | Sound classification (laughter, applause, custom sounds) | 2019 |
| Translation | On-device language translation | 2020 |
| ML Compute | Hardware-accelerated training graph for CPU and GPU | 2020 |
| Foundation Models | Access to Apple's on-device 3B language model with adapters | 2024 |
Most apps use Core ML through one of these higher-level frameworks rather than calling the model directly.
Core ML supports a range of classical and deep-learning model types, with the catalogue growing each year.
| Family | Examples | Supported since |
|---|---|---|
| Convolutional neural networks | ResNet, MobileNet, EfficientNet, image classifiers, object detectors | Core ML 1 (2017) |
| Recurrent neural networks | LSTM, GRU for sequence tasks | Core ML 1 (2017) |
| Transformer networks | BERT, ViT, encoder-decoder models, Mistral 7B | Improved each release; first-class with stateful KV-cache in 2024 |
| Tree ensembles | Random Forest, gradient-boosted trees, XGBoost via conversion | Core ML 1 (2017) |
| Linear and logistic regression | Generalised linear models | Core ML 1 (2017) |
| Support vector machines | Linear and kernel SVMs | Core ML 1 (2017) |
| Pipelines | Chains of feature extractors, scalers, and predictors | Core ML 1 (2017) |
| Word embeddings | Word2Vec-style and custom embeddings | Core ML 2 (2018) |
| Sound and activity classifiers | Audio frame classifiers, accelerometer activity classifiers | Core ML 3 (2019) |
| k-Nearest Neighbours classifiers | Updatable on-device classifiers for personalisation | Core ML 3 (2019) |
| Custom layers and ops | Developer-defined layers in Swift / Metal / Python | Core ML 1 (custom layers); custom Torch ops in coremltools |
The coremltools package is the open-source Python toolchain that converts trained models from popular frameworks into Core ML format. It is BSD-3-Clause licensed and developed in the open on GitHub.
| Source framework | Path | Notes |
|---|---|---|
| PyTorch | TorchScript or torch.export to MIL to ML Program | The most modern path; supports custom ops via @register_torch_op. |
| TensorFlow 2.x | SavedModel or tf.keras to MIL | Replaces the legacy tfcoreml tool. |
| TensorFlow 1.x | Frozen GraphDef to MIL | Still supported; mostly used for legacy models. |
| ONNX | Deprecated; was supported via onnx-coreml | The recommended path is now ONNX to PyTorch to Core ML, or to use ONNX Runtime instead. |
| scikit-learn | coremltools.converters.sklearn | Decision trees, linear models, SVMs, pipelines. |
| XGBoost | coremltools.converters.xgboost | Gradient-boosted tree models. |
| LightGBM | Convert to ONNX or scikit-learn first | No native converter. |
| LibSVM | coremltools.converters.libsvm | Classic SVM models. |
| Hugging Face | exporters package | Apple maintains a Hugging Face exporters extension that handles Transformers models. |
The central conversion call is coremltools.convert(model, ...). From coremltools 4 onward this is the Unified Conversion API: it auto-detects the source framework, lowers the graph to the Model Intermediate Language (MIL) for optimisation passes (constant folding, dead-code elimination, layer fusion), and lowers MIL down to either the legacy NeuralNetwork model type or the modern ML Program type. From coremltools 7 the default target is ML Program.
Coremltools also includes utilities for inspecting models, comparing predictions against the source framework, validating input and output shapes, and applying post-training optimisations.
Fitting a competitive model into a phone's RAM and within the Neural Engine's compute budget is the central engineering problem on Apple platforms. Core ML and coremltools support a stack of techniques.
| Technique | What it does | Bit widths |
|---|---|---|
| Float16 | Half-precision weights and activations | 16-bit; default for ML Programs since coremltools 5 |
| Linear quantisation | Affine mapping of weights to a smaller integer range | INT8 weights and activations; INT4 weights from coremltools 7 |
| Block-wise linear quantisation | Different scale per block of weights, configurable block size | INT4 (and INT8) per-block, introduced 2024 |
| Palettization | k-means clustering of weights into a lookup table indexed per weight | 1, 2, 3, 4, 6, or 8 bits per index |
| Magnitude pruning | Sets near-zero weights to exactly zero, then stores them sparsely | Variable sparsity ratio, Core ML 7+ |
| Joint compression | Combines pruning + quantisation or pruning + palettization | coremltools 8+ |
A practical example from Apple's own work: the on-device foundation model for Apple Intelligence uses mixed 2-bit and 4-bit palettization that averages 3.7 bits per weight while preserving accuracy comparable to the uncompressed model. The Mistral 7B example Apple shipped at WWDC 2024 uses 4-bit block-wise linear quantisation, which shrinks the model from about 14 GB at Float16 to about 3.8 GB on disk and keeps total memory under 4 GB at runtime.
Core ML targets three backends and dispatches operations to whichever can run them fastest within the user-selected compute units.
| Backend | Strengths | Trade-offs |
|---|---|---|
| Apple Neural Engine (ANE) | Very low power per operation, very high throughput on supported ops, leaves the GPU and CPU free for the rest of the app | Only a subset of operations are ANE-friendly; weights have layout constraints; some shapes force a fall-back |
| GPU (Metal Performance Shaders, MPSGraph) | Flexible; supports almost any tensor op; large memory bandwidth; good for big batch sizes and unusual shapes | Higher power than ANE; competes with rendering work for GPU time |
| CPU (BNNS / Accelerate) | Universally available; deterministic; required for some ops not implemented elsewhere; benefits from the AMX matrix coprocessor on Apple Silicon | Lowest throughput per watt for large neural networks |
The Apple Neural Engine started in 2017 as a 2-core unit on the A11 Bionic delivering 0.6 TOPS, sufficient to power Face ID and Animoji. It has grown roughly an order of magnitude every few generations: 5 TOPS on the A12 (2018), 8.5 TOPS on the A14 / M1 (2020), 15.8 TOPS on the A15 / M2, 17 TOPS on the A16, 35 TOPS on the A17 Pro (2023), 38 TOPS on the M4 (2024). Every chip Apple has shipped since 2017 includes a Neural Engine, which means there is a guaranteed installed base of well over a billion ANE-capable devices.
Core ML has shipped on every iPhone, iPad, Apple Watch, and Mac running a current Apple operating system since 2017. It is built into the OS and free for developers to use, which has driven broad uptake. Major Apple-internal users include:
Third-party adoption is broad. Apps such as Pinterest, Etsy, eBay, and Snapchat use Vision-backed Core ML for image search and AR effects. Healthcare apps use Core ML for skin lesion screening, retinal image classification, and pose estimation. Photography apps such as Halide and Photomator embed custom Core ML models for noise reduction and subject masking. Game developers use Core ML through MPSGraph for upscaling and gesture recognition.
At WWDC 2024 Apple introduced Apple Intelligence, the user-facing brand for the company's generative-AI features. Apple Intelligence is built on top of Core ML in two ways. First, the on-device ~3 billion parameter foundation language model runs as a Core ML model on the Apple Neural Engine, using the new Stateful API for its KV-cache. Apple reported around 0.6 ms per prompt token and roughly 30 tokens per second of generation on an iPhone 15 Pro, before token-speculation optimisations. Second, Apple released a separate Foundation Models framework that lets third-party developers call the same on-device 3B model through a high-level Swift API, optionally with task-specific adapters.
Apple's adapter strategy keeps the base model fixed and swaps in small LoRA-style modules (typically tens of megabytes for a rank-16 adapter) for each task: summarisation, proofreading, mail reply suggestions, and so on. The base model itself uses mixed 2-bit and 4-bit palettization for an average of 3.7 bits per weight, with small adapters held in 16-bit precision.
The Foundation Models framework is technically a separate framework from Core ML, but it is tightly coupled in practice: it loads the model through the Core ML runtime, runs it on the Neural Engine, and benefits from every optimisation Apple has shipped in coremltools.
Core ML is one of several on-device inference engines. The main competitors target Android, embedded Linux, or cross-platform deployments.
| Framework | Vendor | Platforms | Accelerator support | Model formats | Year | License |
|---|---|---|---|---|---|---|
| Core ML | Apple | iOS, iPadOS, macOS, watchOS, tvOS, visionOS | Apple Neural Engine, Metal GPU, CPU/AMX | .mlmodel, .mlpackage | 2017 | Closed (runtime); BSD-3-Clause (coremltools) |
| TensorFlow Lite (LiteRT) | Android, iOS, Linux, microcontrollers | NNAPI, GPU delegate, Hexagon, Edge TPU, Core ML delegate on iOS | .tflite (FlatBuffers) | 2017 (Nov) | Apache 2.0 | |
| ONNX Runtime Mobile | Microsoft and contributors | Android, iOS, Linux, Windows | NNAPI, Core ML, XNNPACK, QNN | .ort, .onnx | 2018 | MIT |
| PyTorch Mobile / ExecuTorch | Meta | Android, iOS, embedded | XNNPACK, Vulkan, Core ML, MPS, QNN | .pte (ExecuTorch), .ptl (legacy mobile) | 2019 (PyTorch Mobile), 2023 (ExecuTorch) | BSD-style |
| MediaPipe | Android, iOS, web, Linux | TFLite delegates, GPU, NNAPI | TFLite under the hood | 2019 | Apache 2.0 | |
| NCNN | Tencent | Android, iOS, Linux, Windows, embedded | Vulkan, OpenCL | .param + .bin | 2017 | BSD-3-Clause |
| MNN | Alibaba | Android, iOS, Linux, Windows, embedded | OpenCL, Vulkan, Metal, CUDA | .mnn | 2019 | Apache 2.0 |
The most common direct comparison is with TensorFlow Lite, which Google rebranded as LiteRT in late 2024. Core ML has the deeper Apple-platform integration, automatic Neural Engine routing, and a richer set of high-level vertical frameworks. TensorFlow Lite covers Android plus a much wider range of embedded targets (microcontrollers, single-board computers) and is the obvious choice for cross-platform mobile apps. Most large iOS/Android codebases end up shipping the same model in both formats, with conversion through ONNX or PyTorch as the lingua franca.
Core ML is locked to Apple platforms, which makes cross-platform deployment a duplication of effort. The runtime itself is closed source, which makes debugging op-routing decisions hard; the Neural Engine in particular is a black box, and developers often resort to reading Console logs and using Xcode's Performance Reports to figure out why a model fell back to GPU or CPU.
Conversion from PyTorch can require workarounds for ops that have no MIL equivalent, custom Torch op registrations, or graph rewrites. Some research-grade architectures (sparse mixture of experts, very large vocabulary models, models with dynamic control flow) take real effort to convert.
On-device training is limited. Core ML offers personalisation (fine-tuning a small number of layers or a k-NN head), not full training; Create ML covers training on a Mac but is opinionated about model architecture; ML Compute provides a lower-level training graph but is rarely used outside Apple.
The Neural Engine has shape and op constraints that change between chip generations, so a model that runs on the ANE on an A17 Pro might run on the GPU on an A14. Apple's documentation often lags the rapid pace of model architecture changes, and the community fills the gap through blogs (Pete Warden, Matthijs Hollemans / machinethink.net, Hugging Face), forums, and reverse-engineering projects.
Core ML itself is closed-source Apple software. The conversion side is open. The coremltools repository on GitHub is BSD-3-Clause licensed, accepts external pull requests, and ships frequent releases (8.x series in 2025, with 9.x in beta). Apple also maintains an apple/ml-stable-diffusion repository that demonstrates ANE-friendly Stable Diffusion in Core ML, and an apple/coreml-stable-diffusion-xl model on Hugging Face. The Hugging Face team maintains an exporters tool that converts Transformers models directly to Core ML, plus the swift-transformers package for running them in Swift apps.
Third-party reverse-engineering projects (notably Matthijs Hollemans' hollance/neural-engine on GitHub) document which ops the ANE supports on each chip, which has become a de facto reference for engineers trying to keep models on the Neural Engine.
Core ML defined on-device machine learning as a first-class platform capability rather than a research curiosity. It shipped before Google's TensorFlow Lite (announced November 2017), before PyTorch Mobile (2019), and well before ExecuTorch (2023). The combination of a dedicated neural accelerator (the Apple Neural Engine) and an OS-level inference framework, both released the same week, was a step change for the mobile industry.
Its long-term effect is twofold. First, it pushed almost every other smartphone vendor to ship a neural processing unit and a corresponding inference framework: Google's NNAPI and now AICore, Qualcomm's Hexagon NPU and SNPE, Samsung's Neural Processing Unit. Second, it created the runtime layer that Apple now uses to ship Apple Intelligence to every supported iPhone, iPad, and Mac, without sending user data to a server. The on-device 3B foundation model that powers Apple Intelligence runs through the same Core ML runtime that has been on every iPhone since 2017; the difference is mostly that the model and the silicon have grown by several orders of magnitude.
For a developer, Core ML is the path of least resistance for shipping a machine-learning model on an Apple device. For Apple, it is the substrate for almost every machine-learning feature the company ships, from Face ID and Live Text to Apple Intelligence and Personal Voice.