Core ML

AI Companies AI Hardware Developer Tools

25 min read

Updated Jun 25, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 25, 2026

Fact-checked

In review queue

Sources

29 citations

Revision

v2 · 5,021 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Core ML is Apple's framework for running trained machine learning models on-device across iOS, iPadOS, macOS, watchOS, tvOS, and visionOS. Introduced at WWDC 2017, it lets developers add a model to an app and execute inference with hardware acceleration that is dispatched automatically across the CPU, the GPU (through Metal), and the Apple Neural Engine (ANE), so that no part of the work has to leave the device. Models ship in the .mlmodel or .mlpackage format, are converted from PyTorch, TensorFlow, and other frameworks with the open-source coremltools toolkit, and can be trained on a Mac with Create ML. Core ML is the runtime layer beneath most of Apple's machine learning features, from Face ID and Live Text to Apple Intelligence ^[1]^[2]^[3].

Core ML shipped with iOS 11 in September 2017 alongside the first Apple Neural Engine on the A11 Bionic chip in the iPhone 8 and iPhone X ^[1]^[20]^[21]. It was the first machine-learning runtime to ship as a first-class system framework on a major mobile operating system, predating Google's TensorFlow Lite (announced November 2017) and arguably defining the category of on-device inference engines. As of iOS 18 and macOS 15 it underpins Apple Intelligence and a Foundation Models framework that exposes Apple's roughly 3 billion parameter on-device language model to third-party developers ^[14]^[15]. At WWDC 2026 Apple introduced a companion framework, Core AI, for generative models, while keeping Core ML as the runtime for classical and smaller neural networks ^[26]^[27].

What is Core ML?

Core ML is a single inference framework that exposes a unified API across many model families. A developer drops a compiled model into Xcode, Xcode auto-generates a strongly typed Swift or Objective-C class for the model's inputs and outputs, and the runtime decides at load time which combination of CPU, GPU, and Neural Engine should execute each operation ^[1]^[3]. The same .mlpackage file runs unmodified on an iPhone, an iPad, an Apple Watch, an Apple TV, an Apple silicon Mac, or an Apple Vision Pro, with the runtime selecting the appropriate compute backend.

Apple positions privacy as the central design goal. In its developer documentation Apple states that Core ML "optimizes on-device performance by leveraging the CPU, GPU, and Neural Engine while minimizing its memory footprint and power consumption," adding that "running a model strictly on the user's device removes any need for a network connection, which helps keep the user's data private and your app responsive" ^[2].

Core ML is closed-source Apple software, but it sits on top of and beside other Apple frameworks that are partly open: Metal Performance Shaders (MPS), MPSGraph, the Basic Neural Network Subroutines (BNNS) in Accelerate, and the coremltools Python package, which is BSD-3-Clause licensed ^[10]^[11].

Property	Value
Developer	Apple Inc.
Initial release	June 2017 (WWDC), iOS 11 in September 2017
Latest model version	Core ML model version 9 (introduced with iOS 18)
File formats	.mlmodel, .mlpackage, compiled .mlmodelc bundle
Conversion toolkit	coremltools (BSD-3-Clause, Python)
Languages	Swift, Objective-C, with Python tooling
Platforms	iOS, iPadOS, macOS, watchOS, tvOS, visionOS
Hardware backends	Apple Neural Engine, GPU (Metal), CPU (BNNS / Accelerate)
Source	Closed source runtime; coremltools is open source
Cost	Free, built into the operating system

ELI5: what does Core ML actually do?

Imagine you trained a robot brain on a big computer to tell cats from dogs. Core ML is the part of an iPhone that takes that brain, shrinks it so it fits in the phone, and runs it quickly using the phone's special AI chip (the Neural Engine) instead of asking a server on the internet. Because the thinking happens right on the phone, it is fast, it works without a connection, and your photos never have to be sent anywhere. App makers do not have to know how the chip works: they hand Core ML the brain, and Core ML figures out the fastest way to run it.

When was Core ML released?

Core ML has shipped a new major version with almost every iOS release. Apple does not always brand each version with a marketing number, but the developer community refers to them by their introduction year and feature set. The model file format itself is independently versioned, and a model file declares the minimum runtime version it requires.

Year	Event	What shipped
2017 (WWDC, June)	Core ML 1 introduced	The .mlmodel format, support for neural networks, tree ensembles, support vector machines, generalised linear models, and pipelines. iOS 11 launched in September with the iPhone 8 and iPhone X, which included the first Apple Neural Engine.
2018 (WWDC)	Core ML 2 + Create ML	Smaller models through 16-bit and 8-bit quantisation, batch prediction, and faster execution. Create ML, a new Swift-and-Xcode-based tool for training image, text, and tabular models on a Mac, was introduced as a complement.
2019 (WWDC)	Core ML 3	On-device personalisation (fine-tuning the last layers of a neural network or k-Nearest Neighbours classifier on the device using user data), more than 100 new layer types including transformer-friendly ops, and support for linked models inside pipelines.
2020 (WWDC)	Core ML 4 + ML Compute	Model encryption built into Xcode and CloudKit-hosted Model Deployment so apps can update models without an App Store release. coremltools 4 shipped a Unified Conversion API for TensorFlow 1.x, TensorFlow 2.x with tf.keras, and PyTorch, all routed through a new Model Intermediate Language (MIL). The separate ML Compute framework abstracted CPU and GPU training on macOS.
2021 (WWDC)	Core ML 5 + ML Program	The .mlpackage container format and the new ML Program model type, which decouples weights from program graph and supports Float16 as a first-class precision. Xcode Cloud allowed continuous deployment of model packages.
2022 (WWDC)	Core ML 6	Faster inference on M-series Macs, ANE-friendly transformer layouts, the public release of Apple's transformer optimisation guide, weight palettization at conversion time, and the new Activity Classifier in Create ML.
2023 (WWDC)	Core ML 7	First-class visionOS support for the Apple Vision Pro, advanced model compression APIs in coremltools 7 (8-bit activation quantisation, magnitude pruning, k-means palettization to as few as 2 bits per weight) and joint compression schemes.
2024 (WWDC)	Core ML 8 + Apple Intelligence	New Stateful API and stateful buffers for KV-cache support in transformer models, the Swift MLTensor type for ergonomic tensor work, multifunction models for packaging LoRA adapters, and 4-bit block-wise linear quantisation. Apple Intelligence and the on-device Foundation Models framework were introduced on top.
2025 (WWDC)	Core ML 9 + Foundation Models framework	Continued improvements in foundation model integration, more aggressive low-bit quantisation, and the Foundation Models framework released to third-party developers with guided generation (the @Generable macro) for working directly with Swift data types ^[25].
2026 (WWDC)	Core AI introduced alongside Core ML	Apple announced Core AI, an Apple-silicon-only framework for on-device large language models and generative AI, positioned as the successor for transformer and generative workloads while Core ML continues to serve classical and smaller neural models ^[26]^[27].

Apple has avoided marketing the framework by version number since 2020, instead announcing features under generic Machine Learning umbrellas at WWDC. The version number that matters in practice is the model specification version baked into a .mlpackage, which determines the minimum operating system that can load it.

What model formats does Core ML use?

Core ML uses two on-disk formats and a compiled runtime form.

.mlmodel (2017)

The original .mlmodel is a single Protocol Buffers file that bundles the model graph, the weights, and the metadata ^[1]. It supports the original neural network type (with a fixed list of layer kinds), tree ensembles, support vector machines, linear and logistic regressors, generalised linear models, scalers, imputers, one-hot encoders, word embeddings, and pipelines that chain several models together. A .mlmodel declares a specification version (1 through about 6 for legacy models) so the runtime can refuse to load a file that uses operations it does not understand.

.mlpackage (2021)

The newer .mlpackage is a directory rather than a single file. It separates the model architecture from the weights and the metadata, which makes it easier to edit, diff, and store in version control, and it lets Apple add new model types without invalidating older tooling ^[11]. The .mlpackage container is required to use the ML Program model type, the Stateful API introduced in 2024, and Float16 as a default precision. Because an ML Program decouples the weights from the program architecture, it cannot be saved as a single .mlmodel file; it is always saved with the .mlpackage extension ^[11].

Compiled .mlmodelc bundle

When Xcode builds an app, it compiles every model into a .mlmodelc bundle. This is an opaque, layout-optimised version of the model for the target operating system. On a user's device, only the compiled bundle exists; the original .mlmodel or .mlpackage is not shipped. Apple's model encryption feature (introduced in Core ML 4) operates on the compiled bundle, with decryption keys retrieved from Apple's servers on first use and held only in memory ^[9].

How does Core ML work?

Core ML is structured as a stack. The high-level Swift and Objective-C API exposes one class per model. Below that, Core ML manages a graph of operations and dispatches each to one of three compute backends ^[1]^[3].

Developers can hint at which backend they prefer through the MLComputeUnits setting on MLModelConfiguration. Common values are cpuOnly (forces CPU execution, useful for debugging and reproducibility), cpuAndGPU (avoids the Neural Engine), cpuAndNeuralEngine (avoids the GPU), and all (the default, which lets Core ML pick). Core ML still routes individual ops; setting cpuAndNeuralEngine does not guarantee that every layer runs on the ANE because some ops are not ANE-supported.

Apple ships several vertical frameworks that wrap Core ML for specific data types:

Framework	Purpose	Year
Vision	Image and video analysis: face detection, text recognition, image classification, object tracking	2017
Natural Language	Tokenisation, language identification, named entity recognition, sentiment analysis, word embeddings	2017 (Foundation), 2018 (NL framework)
Speech	Speech-to-text recognition with on-device or server modes	2016 (pre-Core ML), with Core ML integration from 2019
SoundAnalysis	Sound classification (laughter, applause, custom sounds)	2019
Translation	On-device language translation	2020
ML Compute	Hardware-accelerated training graph for CPU and GPU	2020
Foundation Models	Access to Apple's on-device 3B language model with adapters	2024

Most apps use Core ML through one of these higher-level frameworks rather than calling the model directly.

What model types does Core ML support?

Core ML supports a range of classical and deep-learning model types, with the catalogue growing each year.

Family	Examples	Supported since
Convolutional neural networks	ResNet, MobileNet, EfficientNet, image classifiers, object detectors	Core ML 1 (2017)
Recurrent neural networks	LSTM, GRU for sequence tasks	Core ML 1 (2017)
Transformer networks	BERT, ViT, encoder-decoder models, Mistral 7B	Improved each release; first-class with stateful KV-cache in 2024
Tree ensembles	Random Forest, gradient-boosted trees, XGBoost via conversion	Core ML 1 (2017)
Linear and logistic regression	Generalised linear models	Core ML 1 (2017)
Support vector machines	Linear and kernel SVMs	Core ML 1 (2017)
Pipelines	Chains of feature extractors, scalers, and predictors	Core ML 1 (2017)
Word embeddings	Word2Vec-style and custom embeddings	Core ML 2 (2018)
Sound and activity classifiers	Audio frame classifiers, accelerometer activity classifiers	Core ML 3 (2019)
k-Nearest Neighbours classifiers	Updatable on-device classifiers for personalisation	Core ML 3 (2019)
Custom layers and ops	Developer-defined layers in Swift / Metal / Python	Core ML 1 (custom layers); custom Torch ops in coremltools

How do you convert a model to Core ML?

The coremltools package is the open-source Python toolchain that converts trained models from popular frameworks into Core ML format. It is BSD-3-Clause licensed and developed in the open on GitHub ^[10].

Source framework	Path	Notes
PyTorch	TorchScript or `torch.export` to MIL to ML Program	The most modern path; supports custom ops via `@register_torch_op`.
TensorFlow 2.x	SavedModel or tf.keras to MIL	Replaces the legacy tfcoreml tool.
TensorFlow 1.x	Frozen GraphDef to MIL	Still supported; mostly used for legacy models.
ONNX	Deprecated; was supported via onnx-coreml	The recommended path is now ONNX to PyTorch to Core ML, or to use ONNX Runtime instead.
scikit-learn	`coremltools.converters.sklearn`	Decision trees, linear models, SVMs, pipelines.
XGBoost	`coremltools.converters.xgboost`	Gradient-boosted tree models.
LightGBM	Convert to ONNX or scikit-learn first	No native converter.
LibSVM	`coremltools.converters.libsvm`	Classic SVM models.
Hugging Face	`exporters` package	Apple maintains a Hugging Face exporters extension that handles Transformers models.

The central conversion call is coremltools.convert(model, ...). From coremltools 4 onward this is the Unified Conversion API: it auto-detects the source framework, lowers the graph to the Model Intermediate Language (MIL) for optimisation passes (constant folding, dead-code elimination, layer fusion), and lowers MIL down to either the legacy NeuralNetwork model type or the modern ML Program type. From coremltools 7 the default target is ML Program, produced by passing convert_to="mlprogram" or simply relying on the default ^[11].

A minimal PyTorch conversion looks like this:

import coremltools as ct
import torch

traced = torch.jit.trace(model, example_input)
mlmodel = ct.convert(
    traced,
    inputs=[ct.TensorType(shape=example_input.shape)],
    convert_to="mlprogram",
)
mlmodel.save("MyModel.mlpackage")

Coremltools also includes utilities for inspecting models, comparing predictions against the source framework, validating input and output shapes, and applying post-training optimisations.

How does Core ML shrink models (quantisation and optimisation)?

Fitting a competitive model into a phone's RAM and within the Neural Engine's compute budget is the central engineering problem on Apple platforms. Core ML and coremltools support a stack of quantization and compression techniques ^[12].

Technique	What it does	Bit widths
Float16	Half-precision weights and activations	16-bit; default for ML Programs since coremltools 5
Linear quantisation	Affine mapping of weights to a smaller integer range	INT8 weights and activations; INT4 weights from coremltools 7
Block-wise linear quantisation	Different scale per block of weights, configurable block size	INT4 (and INT8) per-block, introduced 2024
Palettization	k-means clustering of weights into a lookup table indexed per weight	1, 2, 3, 4, 6, or 8 bits per index ^[13]
Magnitude pruning	Sets near-zero weights to exactly zero, then stores them sparsely	Variable sparsity ratio, Core ML 7+
Joint compression	Combines pruning + quantisation or pruning + palettization	coremltools 8+

A practical example from Apple's own work: the on-device foundation model for Apple Intelligence uses mixed 2-bit and 4-bit palettization that averages 3.7 bits per weight while preserving accuracy comparable to the uncompressed model ^[15]. The Mistral 7B example Apple shipped at WWDC 2024 uses 4-bit block-wise linear quantisation, which shrinks the model from about 14 GB at Float16 to about 3.8 GB on disk and keeps total memory under 4 GB at runtime ^[16].

In a separate 2024 case study, Apple ran Llama-3.1-8B-Instruct entirely on-device with Core ML at roughly 33 decoding tokens per second on a Mac with an M1 Max, using Int4 weight quantisation together with the new stateful key-value cache to avoid repeatedly copying the KV cache, which can reach about 1 GB at an 8,192-token context ^[28].

What hardware does Core ML run on?

Core ML targets three backends and dispatches operations to whichever can run them fastest within the user-selected compute units.

Backend	Strengths	Trade-offs
Apple Neural Engine (ANE)	Very low power per operation, very high throughput on supported ops, leaves the GPU and CPU free for the rest of the app	Only a subset of operations are ANE-friendly; weights have layout constraints; some shapes force a fall-back
GPU (Metal Performance Shaders, MPSGraph)	Flexible; supports almost any tensor op; large memory bandwidth; good for big batch sizes and unusual shapes	Higher power than ANE; competes with rendering work for GPU time
CPU (BNNS / Accelerate)	Universally available; deterministic; required for some ops not implemented elsewhere; benefits from the AMX matrix coprocessor on Apple silicon	Lowest throughput per watt for large neural networks

The Apple Neural Engine started in 2017 as a 2-core unit on the A11 Bionic delivering 0.6 TOPS, sufficient to power Face ID and Animoji ^[20]^[21]. It has grown roughly an order of magnitude every few generations: 5 TOPS on the A12 (2018), 8.5 TOPS on the A14 / M1 (2020), 15.8 TOPS on the A15 / M2, 17 TOPS on the A16, 35 TOPS on the A17 Pro (2023), 38 TOPS on the M4 (2024) ^[20]. Every chip Apple has shipped since 2017 includes a Neural Engine, which means there is a guaranteed installed base of well over a billion ANE-capable devices.

What apps use Core ML?

Core ML has shipped on every iPhone, iPad, Apple Watch, and Mac running a current Apple operating system since 2017. It is built into the OS and free for developers to use, which has driven broad uptake. Major Apple-internal users include:

Photos: subject and scene classification, object recognition, OCR for Live Text, on-device face clustering, semantic search.
Camera: Smart HDR, Deep Fusion, computational photography, Cinematic mode subject tracking, Photographic Styles.
Siri: on-device speech recognition (since iOS 15), wake-word detection, Personal Voice synthesis.
Translate: on-device translation across the supported language pairs.
Keyboard: QuickType predictions, autocorrect, multilingual typing.
Health and Fitness: fall detection, ECG analysis, walking steadiness, sleep stage classification.
Apple Intelligence: writing tools, summarisation, image generation (Image Playground, Genmoji), priority notifications, on-device search.

Third-party adoption is broad. Apps such as Pinterest, Etsy, eBay, and Snapchat use Vision-backed Core ML for image search and AR effects. Healthcare apps use Core ML for skin lesion screening, retinal image classification, and pose estimation. Photography apps such as Halide and Photomator embed custom Core ML models for noise reduction and subject masking. Game developers use Core ML through MPSGraph for upscaling and gesture recognition.

How does Core ML power Apple Intelligence and Foundation Models?

At WWDC 2024 Apple introduced Apple Intelligence, the user-facing brand for the company's generative-AI features. Apple Intelligence is built on top of Core ML in two ways. First, the on-device roughly 3 billion parameter foundation language model runs as a Core ML model on the Apple Neural Engine, using the new Stateful API for its KV-cache. Apple reported about 0.6 milliseconds per prompt token of latency and roughly 30 tokens per second of generation on an iPhone 15 Pro, before token-speculation optimisations ^[15]. Second, Apple released a separate Foundation Models framework that lets third-party developers call the same on-device 3B model through a high-level Swift API, optionally with task-specific adapters ^[14].

Apple's adapter strategy keeps the base model fixed and swaps in small LoRA-style modules (typically tens of megabytes for a rank-16 adapter) for each task: summarisation, proofreading, mail reply suggestions, and so on. The base model itself uses mixed 2-bit and 4-bit palettization for an average of 3.7 bits per weight, with small adapters held in 16-bit precision ^[15].

In June 2026 Apple introduced its third generation of foundation models. The on-device tier now spans an AFM 3 Core model at about 3 billion parameters and a larger AFM 3 Core Advanced model at about 20 billion parameters that uses sparse activation (only 1 to 4 billion parameters active at a time), with multilingual support across 28 languages ^[29]. Apple emphasises the privacy stance of its training: "We do not use our users' private personal data or user interactions when training our foundation models" ^[29].

The Foundation Models framework is technically a separate framework from Core ML, but it is tightly coupled in practice: it loads the model through the Core ML runtime, runs it on the Neural Engine, and benefits from every optimisation Apple has shipped in coremltools.

What is Core AI, and is Core ML being replaced?

In March 2026, Bloomberg's Mark Gurman reported that Apple would introduce a modernised framework called Core AI at WWDC, reflecting a shift in terminology from "machine learning" to "AI" ^[26]. At WWDC 2026 Apple confirmed Core AI as a dedicated framework for on-device large language models and generative AI. According to coverage of the announcement, Core AI runs only on Apple silicon, supports models from compact 3 billion parameter vision models up to roughly 70 billion parameter reasoning models, and promises "zero server dependencies" and "zero per-token cloud costs" ^[27].

Core AI does not retire Core ML. The two coexist: Apple's guidance is to use Core ML for classical and smaller neural models (classification, object detection, decision trees) and Core AI for transformers and generative workloads, with MLX as a third option for researchers working with custom weights ^[27]. Existing .mlmodel and .mlpackage files continue to work, and Apple's long deprecation cycles mean Core ML is not going away in a single release ^[26].

How does Core ML compare with other on-device ML frameworks?

Core ML is one of several on-device inference engines. The main competitors target Android, embedded Linux, or cross-platform deployments.

Framework	Vendor	Platforms	Accelerator support	Model formats	Year	License
Core ML	Apple	iOS, iPadOS, macOS, watchOS, tvOS, visionOS	Apple Neural Engine, Metal GPU, CPU/AMX	.mlmodel, .mlpackage	2017	Closed (runtime); BSD-3-Clause (coremltools)
TensorFlow Lite (LiteRT)	Google	Android, iOS, Linux, microcontrollers	NNAPI, GPU delegate, Hexagon, Edge TPU, Core ML delegate on iOS	.tflite (FlatBuffers)	2017 (Nov)	Apache 2.0
ONNX Runtime Mobile	Microsoft and contributors	Android, iOS, Linux, Windows	NNAPI, Core ML, XNNPACK, QNN	.ort, .onnx	2018	MIT
PyTorch Mobile / ExecuTorch	Meta	Android, iOS, embedded	XNNPACK, Vulkan, Core ML, MPS, QNN	.pte (ExecuTorch), .ptl (legacy mobile)	2019 (PyTorch Mobile), 2023 (ExecuTorch)	BSD-style
MediaPipe	Google	Android, iOS, web, Linux	TFLite delegates, GPU, NNAPI	TFLite under the hood	2019	Apache 2.0
NCNN	Tencent	Android, iOS, Linux, Windows, embedded	Vulkan, OpenCL	.param + .bin	2017	BSD-3-Clause
MNN	Alibaba	Android, iOS, Linux, Windows, embedded	OpenCL, Vulkan, Metal, CUDA	.mnn	2019	Apache 2.0

The most common direct comparison is with TensorFlow Lite, which Google rebranded as LiteRT in late 2024. Core ML has the deeper Apple-platform integration, automatic Neural Engine routing, and a richer set of high-level vertical frameworks. TensorFlow Lite covers Android plus a much wider range of embedded targets (microcontrollers, single-board computers) and is the obvious choice for cross-platform mobile apps. Most large iOS/Android codebases end up shipping the same model in both formats, with conversion through ONNX or PyTorch as the lingua franca.

Strengths

Tight integration with the Apple Neural Engine, the GPU through Metal, and the AMX matrix coprocessor on Apple silicon, with automatic op-by-op dispatch.
Privacy-preserving by construction; inference runs on the device, so user data never has to leave the phone ^[2].
A free, system-installed runtime with a guaranteed installed base on every Apple device since 2017.
A mature open-source conversion toolchain (coremltools) that accepts PyTorch, TensorFlow, scikit-learn, XGBoost, and Hugging Face Transformers.
Continuous improvement with each iOS release, including aggressive support for the latest model architectures (transformers, KV-cache, low-bit quantisation, LoRA-style adapters).
High-level vertical frameworks (Vision, Natural Language, Speech, SoundAnalysis) that turn complex pipelines into a few lines of Swift.

Limitations

Core ML is locked to Apple platforms, which makes cross-platform deployment a duplication of effort. The runtime itself is closed source, which makes debugging op-routing decisions hard; the Neural Engine in particular is a black box, and developers often resort to reading Console logs and using Xcode's Performance Reports to figure out why a model fell back to GPU or CPU ^[19].

Conversion from PyTorch can require workarounds for ops that have no MIL equivalent, custom Torch op registrations, or graph rewrites. Some research-grade architectures (sparse mixture of experts, very large vocabulary models, models with dynamic control flow) take real effort to convert.

On-device training is limited. Core ML offers personalisation (fine-tuning a small number of layers or a k-NN head), not full training; Create ML covers training on a Mac but is opinionated about model architecture; ML Compute provides a lower-level training graph but is rarely used outside Apple.

The Neural Engine has shape and op constraints that change between chip generations, so a model that runs on the ANE on an A17 Pro might run on the GPU on an A14. Apple's documentation often lags the rapid pace of model architecture changes, and the community fills the gap through blogs (Pete Warden, Matthijs Hollemans / machinethink.net, Hugging Face), forums, and reverse-engineering projects.

Open source and community

Core ML itself is closed-source Apple software. The conversion side is open. The coremltools repository on GitHub is BSD-3-Clause licensed, accepts external pull requests, and ships frequent releases (8.x series in 2025, with 9.x in beta) ^[10]. Apple also maintains an apple/ml-stable-diffusion repository that demonstrates ANE-friendly Stable Diffusion in Core ML, and an apple/coreml-stable-diffusion-xl model on Hugging Face. The Hugging Face team maintains an exporters tool that converts Transformers models directly to Core ML, plus the swift-transformers package for running them in Swift apps.

Third-party reverse-engineering projects (notably Matthijs Hollemans' hollance/neural-engine on GitHub) document which ops the ANE supports on each chip, which has become a de facto reference for engineers trying to keep models on the Neural Engine ^[19].

Why does Core ML matter?

Core ML defined on-device machine learning as a first-class platform capability rather than a research curiosity. It shipped before Google's TensorFlow Lite (announced November 2017), before PyTorch Mobile (2019), and well before ExecuTorch (2023). The combination of a dedicated neural accelerator (the Apple Neural Engine) and an OS-level inference framework, both released the same week, was a step change for the mobile industry.

Its long-term effect is twofold. First, it pushed almost every other smartphone vendor to ship a neural processing unit and a corresponding inference framework: Google's NNAPI and now AICore, Qualcomm's Hexagon NPU and SNPE, Samsung's Neural Processing Unit. Second, it created the runtime layer that Apple now uses to ship Apple Intelligence to every supported iPhone, iPad, and Mac, without sending user data to a server. The on-device 3B foundation model that powers Apple Intelligence runs through the same Core ML runtime that has been on every iPhone since 2017; the difference is mostly that the model and the silicon have grown by several orders of magnitude.

For a developer, Core ML is the path of least resistance for shipping a machine-learning model on an Apple device. For Apple, it is the substrate for almost every machine-learning feature the company ships, from Face ID and Live Text to Apple Intelligence and Personal Voice.

References

Apple. "Core ML | Apple Developer Documentation." https://developer.apple.com/documentation/coreml ↩
Apple. "Core ML Overview - Machine Learning - Apple Developer." https://developer.apple.com/machine-learning/core-ml/ ↩
Apple. "Introducing Core ML - WWDC17 - Videos." https://developer.apple.com/videos/play/wwdc2017/703/ ↩
Apple. "Core ML in depth - WWDC17 - Videos." https://developer.apple.com/videos/play/wwdc2017/710/
Apple. "Introducing Create ML - WWDC18 - Videos." https://developer.apple.com/videos/play/wwdc2018/703/
Apple. "Core ML 3 Framework - WWDC19 - Videos." https://developer.apple.com/videos/play/wwdc2019/704/
Apple. "Use model deployment and security with Core ML - WWDC20 - Videos." https://developer.apple.com/videos/play/wwdc2020/10152/
Apple. "Get models on device using Core ML Converters - WWDC20 - Videos." https://developer.apple.com/videos/play/wwdc2020/10153/
Apple. "Encrypting a Model in Your App | Apple Developer Documentation." https://developer.apple.com/documentation/coreml/encrypting-a-model-in-your-app ↩
Apple. apple/coremltools GitHub repository. https://github.com/apple/coremltools ↩
Apple. "Convert Models to ML Programs - Guide to Core ML Tools." https://apple.github.io/coremltools/docs-guides/source/convert-to-ml-program.html ↩
Apple. "Optimization Overview - Guide to Core ML Tools." https://apple.github.io/coremltools/docs-guides/source/opt-overview.html ↩
Apple. "Palettization Overview - Guide to Core ML Tools." https://apple.github.io/coremltools/docs-guides/source/opt-palettization-overview.html ↩
Apple Machine Learning Research. "Introducing Apple's On-Device and Server Foundation Models." https://machinelearning.apple.com/research/introducing-apple-foundation-models ↩
Apple Machine Learning Research. "Apple Intelligence Foundation Language Models." https://machinelearning.apple.com/research/apple-intelligence-foundation-language-models ↩
Pedro Cuenca. "WWDC 24: Running Mistral 7B with Core ML," Hugging Face Blog. https://huggingface.co/blog/mistral-coreml ↩
Matthijs Hollemans. "New stuff in Apple's machine learning ecosystem 2020," machinethink.net. https://machinethink.net/blog/new-in-apple-machine-learning-2020/
Matthijs Hollemans. "On-device training with Core ML," machinethink.net. https://machinethink.net/blog/coreml-training-part1/
Matthijs Hollemans. hollance/neural-engine GitHub repository, supported devices. https://github.com/hollance/neural-engine/blob/master/docs/supported-devices.md ↩
Wikipedia. "Neural Engine." https://en.wikipedia.org/wiki/Neural_Engine ↩
Wikipedia. "Apple A11." https://en.wikipedia.org/wiki/Apple_A11 ↩
Wikipedia. "Apple Vision Pro." https://en.wikipedia.org/wiki/Apple_Vision_Pro
VentureBeat. "Apple's Core ML now lets app developers update AI models on the fly," June 24, 2020. https://venturebeat.com/2020/06/24/apples-core-ml-now-lets-app-developers-update-ai-models-on-the-fly/
InfoQ. "Core ML 3 Extends Available Model Types, Adds On-Device Model Retrain," June 2019. https://www.infoq.com/news/2019/06/core-ml-3-on-device-retrain/
Apple. "Meet the Foundation Models framework - WWDC25 - Videos." https://developer.apple.com/videos/play/wwdc2025/286/ ↩
9to5Mac. "Apple replacing Core ML with modernized Core AI framework for iOS 27 at WWDC," March 1, 2026. https://9to5mac.com/2026/03/01/apple-replacing-core-ml-with-modernized-core-ai-framework-for-ios-27-at-wwdc/ ↩
InfoQ. "Apple Launches Core AI for Apple-Silicon Optimized On-Device Generative AI," June 2026. https://www.infoq.com/news/2026/06/apple-core-ai-wwdc/ ↩
Apple Machine Learning Research. "On Device Llama 3.1 with Core ML." https://machinelearning.apple.com/research/core-ml-on-device-llama ↩
Apple Machine Learning Research. "Introducing the Third Generation of Apple's Foundation Models." https://machinelearning.apple.com/research/introducing-third-generation-of-apple-foundation-models ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Apple Inc.Apple Silicon Edge AI Edge computing MLX MobileNet Small language model

What is Core ML?

ELI5: what does Core ML actually do?

When was Core ML released?

What model formats does Core ML use?

.mlmodel (2017)

.mlpackage (2021)

Compiled .mlmodelc bundle

How does Core ML work?

What model types does Core ML support?

How do you convert a model to Core ML?

How does Core ML shrink models (quantisation and optimisation)?

What hardware does Core ML run on?

What apps use Core ML?

How does Core ML power Apple Intelligence and Foundation Models?

What is Core AI, and is Core ML being replaced?

How does Core ML compare with other on-device ML frameworks?

Strengths

Limitations

Open source and community

Why does Core ML matter?

See also

References

Improve this article

Related Articles

MLX

CuDNN

TensorFlow Lite (LiteRT)

ROCm

Hugging Face

Replicate

What links here

Related Articles

MLX

CuDNN

TensorFlow Lite (LiteRT)

ROCm

Hugging Face

Replicate

What links here