Apple Foundation Models

Apple Foundation Models (AFM) are the large language models developed by Apple to power Apple Intelligence, the company's generative AI system introduced at WWDC 2024. The family consists of two primary models: AFM-on-device, a roughly 3-billion-parameter model optimized to run locally on Apple Silicon, and AFM-server, a larger mixture-of-experts model deployed in Apple's Private Cloud Compute infrastructure. Both models underpin features including Writing Tools, Smart Reply, notification summaries, and Siri's extended language-understanding capabilities.

Apple described the technical details of these models in a paper published on arXiv in July 2024, and released a more comprehensive 2025 tech report covering architectural improvements, multilingual expansion, and new developer APIs. Unlike most AI models from large technology companies, AFM runs substantially on the user's own device, a design choice that reflects Apple's long-standing emphasis on local processing and data minimization.

Background

Apple Intelligence was announced at WWDC 2024 in June of that year as the company's first major foray into generative AI integrated across iOS, iPadOS, and macOS. Rather than rely entirely on third-party APIs or cloud inference, Apple built its own foundation models and deployed them using a combination of on-device inference and a hardened cloud backend.

Apple's AI research work predates the public announcement substantially. The company had been working on neural language models and on-device machine learning for years, publishing work on topics such as federated learning, differential privacy, and on-device speech recognition. The foundation model effort represents a consolidation of those threads into a single, general-purpose language model suitable for consumer use cases at scale.

The initial October 2024 release with iOS 18.1 supported U.S. English only. iOS 18.2 in December 2024 extended availability to localized English in Canada, Australia, New Zealand, Ireland, the United Kingdom, and South Africa. iOS 18.4 in April 2025 added French, German, Italian, Portuguese (Brazil), Spanish, Japanese, Korean, and Simplified Chinese, bringing Apple Intelligence to the European Union for the first time. By the 2025 tech report, the models supported 16 languages with ongoing expansion.

Hardware requirements

AFM-on-device runs on devices with an A17 Pro chip or newer (iPhone 15 Pro and later) or any M-series Apple Silicon chip (iPad and Mac). The minimum configuration is an M1 chip, which Apple introduced in late 2020. The Neural Engine in these chips handles the bulk of inference; the M1's Neural Engine can execute 11 trillion operations per second, which is sufficient for the model's 2-bit quantized weights. The AFM-server model runs on Apple-designed server hardware in Apple's data centers, also using Apple Silicon.

AFM-on-device architecture

AFM-on-device is a dense, decoder-only Transformer with approximately 3.18 billion total parameters (2.58B non-embedding, 0.15B embedding weights). The model uses a standard modern Transformer configuration:

Parameter	Value
Model dimension	3,072
Feed-forward dimension	~8,192 (SwiGLU)
Attention heads (query)	24
Attention heads (key/value)	8
Layers	26
Head dimension	128
Vocabulary size	49,000 tokens
Context length (production)	4,096 tokens
Extended context	32,768 tokens

The architecture uses grouped-query attention with 8 key/value heads against 24 query heads, which reduces the memory footprint of the KV cache substantially compared to multi-head attention. Positional embeddings use RoPE (Rotary Position Embeddings). Normalization is RMSNorm applied before each sublayer. The activation function is SwiGLU. Input and output embedding matrices are shared.

One notable architectural decision is a two-block structure with a 5:3 depth ratio. The full 26-layer model is divided into a first block of roughly 16 layers and a second block of roughly 10 layers. All key-value caches in the second block are shared with the KV caches produced by the final layer of the first block. This reduces KV cache memory by 37.5% compared to a conventional setup where each layer maintains its own cache. The practical effect is a significant reduction in time-to-first-token, which is the latency metric most noticeable to users when starting a generation.

On iPhone 15 Pro, Apple reported a time-to-first-token of approximately 0.6 milliseconds per prompt token and a generation rate of 30 tokens per second before token speculation. These numbers position the model within the range needed for interactive use without perceptible lag.

Quantization

The production on-device model uses a mixed-precision scheme averaging 3.7 bits per weight (the model can be compressed to 3.5 bits without significant quality loss). Most projection weights use 4-bit palettization with 16-column grouping. Some layers use 2-bit quantization. Embedding layers use 8-bit per-channel quantization. KV caches use 8-bit quantization.

Critically, Apple applies quantization-aware training (QAT) rather than post-training quantization. This means the model is trained with simulated quantization noise so that the weights settle into positions that are robust to low-bit representation. Apple also trains lightweight accuracy-recovery LoRA adapters on approximately 10 billion tokens to compensate for any residual quantization loss. The company uses an internal tool called Talaria to optimize per-layer bit-rate assignments, balancing model quality against memory and latency budgets.

Activation quantization is applied separately from weight quantization. KV cache updates use efficient Neural Engine kernels optimized for the specific memory layout of Apple Silicon.

AFM-server architecture

AFM-server is a larger model running on Apple Silicon servers inside the Private Cloud Compute (PCC) infrastructure. Apple has not publicly disclosed the total parameter count. The model uses a different vocabulary with 100,000 tokens (expanded to 150,000 in the 2025 update to support more languages).

The architectural innovation in AFM-server is a design called Parallel-Track Mixture-of-Experts (PT-MoE). Conventional mixture-of-experts (MoE) architectures route each token to a subset of expert feed-forward networks, reducing the active parameters per forward pass. The PT-MoE design extends this with an additional dimension of parallelism called track parallelism.

PT-MoE design

In track parallelism, the server model contains multiple smaller Transformer stacks called tracks. Each track processes tokens independently in parallel. Synchronization between tracks occurs only at the input and output boundaries of each track block, not at every layer. Conventional tensor parallelism requires synchronization at every layer, which generates 2L synchronization points for a model with L layers. With PT-MoE and track block depth D, synchronization points are reduced to L/D. At D=4, this represents an 87.5% reduction in inter-device communication overhead.

Each track block also has its own MoE layers. The combined structure gives the server model high total capacity (via the MoE experts) while keeping inference latency manageable (via reduced synchronization). Apple additionally uses interleaved global-local attention in the server model: most attention layers use sliding-window local attention over nearby tokens, with global attention layers interspersed at intervals. This combination supports efficient processing of longer sequences up to 65,536 tokens.

The server model includes a vision encoder: ViT-g (approximately 1 billion parameters) for image understanding tasks.

Training

Pre-training

Both models are trained using Apple's AXLearn framework, an open-source library built on JAX and XLA released in 2023 (Apache 2.0 license). AXLearn supports data parallelism, tensor parallelism, sequence parallelism, and Fully Sharded Data Parallel (FSDP) training simultaneously across thousands of accelerators. AFM-server was pre-trained on 8,192 TPUv4 chips for 6.3 trillion tokens.

The pre-training data mixture combines:

Web pages crawled by Applebot (with robots.txt opt-out respected)
Licensed content from publishers
Code from GitHub filtered to permissively licensed repositories (Swift, Python, C, Objective-C, C++, JavaScript, Java, Go)
Mathematical content: approximately 3 billion tokens of math Q&A and 14 billion tokens from forums and blogs
Public datasets, decontaminated against 811 benchmarks to prevent evaluation leakage

Apple does not use private user data or user interactions in training. The company applies filtering to remove personally identifiable information and low-quality content.

Pre-training uses a sequence length of 4,096 tokens with batch size 4,096. A continued pre-training phase of 1 trillion tokens at sequence length 8,192 emphasizes code and mathematics. A context-lengthening phase trains on 100 billion tokens at 32,768 sequence length.

Pruning and distillation for AFM-on-device

AFM-on-device is not trained from scratch at its final size. Instead, it is initialized from a pruned version of a 6.4-billion-parameter model. The pruning procedure learns sparse masks using a Soft-Top-K masking method (similar to methods from Wang et al. 2020 and Xia et al. 2023). Pruning is applied only to the hidden dimension of feed-forward layers. The mask is learned over 188 billion tokens using the same data mixture as core pre-training.

After pruning to the 3B target size, the model is trained with knowledge distillation for the full 6.3 trillion token core pre-training run. The distillation loss replaces the standard cross-entropy target labels with a convex combination of the true one-hot labels and the teacher model's top-1 predictions, with weight 0.9 assigned to the teacher's labels. The teacher model is AFM-server or a larger model in the training pipeline.

This combination yields measurable gains: initializing from the pruned model improves benchmark results by 0 to 2% over random initialization at the same parameter count. Adding distillation boosts MMLU by approximately 5 percentage points and GSM8K (math reasoning) by approximately 3 percentage points. Distillation was not found to be helpful during the continued pre-training phase, so that phase uses the same recipe as AFM-server.

Post-training

Both models go through supervised fine-tuning (SFT) followed by reinforcement learning from human feedback (RLHF). Learning rates are 5e-6 for the server model and 2e-5 for the on-device model, with dropout of 0.1.

Apple developed two algorithmic innovations for post-training:

Iterative Teaching Committee (iTeC): A multi-round data collection strategy in which a committee of models (not just the single model being trained) generates candidate responses. This committee approach produces higher-quality synthetic training data than self-improvement methods that rely on a single model. Human annotators and automated judges then filter and rank the committee outputs.

Mirror Descent with Leave-One-Out advantage estimation (MDLOO): An RLHF algorithm that uses mirror descent policy optimization with a leave-one-out baseline for variance reduction in the advantage estimate. This is more stable than standard PPO-based methods in Apple's training setup. The reward model uses soft labels that encode the intensity of human preferences rather than hard binary win/loss labels, and uses single-sided grading as a regularization technique.

Synthetic data generation is used across several domains:

Mathematics: problem evolution through depth and breadth expansion, rephrasing, reverse-engineering solutions
Tool use: bootstrapped from synthetic examples, then refined with human data
Coding: self-instruct with rejection sampling across 71 programming language topics

The 2025 models added RLHF training data in all 16 supported languages. The multilingual RLHF phase yielded a 16:9 win/loss rate over SFT-only models in human evaluations.

Task adapters

AFM-on-device uses a LoRA-style adapter system to specialize the base model for individual features without retraining the full model. Apple refers to these as task-specific adapters.

Adapters are small neural network modules inserted into specific layers of the frozen pre-trained model. They modify attention projection matrices, attention matrices, and the fully connected layers in feed-forward networks. Adapter weights use 16-bit (float16) representation. A rank-16 adapter on the 3.18B parameter model occupies tens of megabytes of storage.

Supported ranks are 8, 16, and 32. Adapters are initialized from accuracy-recovery adapters (the same components used to compensate for quantization loss) to provide a warm start.

At runtime, adapters are dynamically loaded, temporarily cached in memory, and swapped as the user switches between features. A writing task loads the Writing Tools adapter; a summarization task loads the summarization adapter; a reply suggestion task loads the Smart Reply adapter. This design keeps the base model in a fixed location in memory while feature-specific behavior is provided by swappable modules.

The 2025 Foundation Models Framework for developers allows third-party developers to train their own rank-32 adapters using Apple's Python adapter training toolkit, enabling custom on-device capabilities for specialized apps.

Apple Private Cloud Compute

When a request cannot be handled on-device due to complexity or context length, Apple Intelligence routes it to AFM-server via Private Cloud Compute (PCC). PCC is a cloud inference infrastructure designed from the ground up with privacy as a first-order constraint rather than an afterthought.

Hardware

PCC runs on custom Apple Silicon server nodes. These servers use the same hardware security technologies as iPhone: a Secure Enclave, Secure Boot, and hardware-rooted trust chains. The operating system is a hardened subset of iOS and macOS, stripped of components not needed for LLM inference, minimizing the attack surface.

Security guarantees

PCC enforces several cryptographic guarantees:

Stateless processing: User data exists on PCC nodes only for the duration of a request. The Secure Enclave randomizes encryption keys on every reboot without persisting them. After a response is returned, no user data remains in any form.

No privileged runtime access: PCC nodes have no remote shell, no interactive debugging capability, and no general-purpose logging. Only pre-specified, structured, audited logs and metrics can leave the node.

End-to-end encryption to specific nodes: User devices encrypt requests directly to the public keys of specific validated PCC nodes, not to a general load balancer. The load balancer can route requests but cannot decrypt them. This prevents a compromised load balancer from reading user data.

Target diffusion: Multiple mechanisms prevent an attacker from reliably routing a specific user's requests to a compromised node. Request metadata excludes personally identifiable information. Authorization uses RSA Blind Signatures that grant access without identifying the user. An OHTTP relay operated by a third party hides the source device's IP address.

Code signing: All software running on PCC nodes must be part of a trust cache signed by Apple and approved for that specific node, loaded by the Secure Enclave in a way that cannot be modified at runtime.

Verifiable transparency

Apple commits to publishing cryptographic measurements of all code running on PCC in an append-only, tamper-proof transparency log. Software images become publicly available within 90 days of deployment, enabling independent security researchers to verify that the deployed software matches the published measurements. Apple provides a Virtual Research Environment (VRE) for Mac that allows researchers to simulate PCC node behavior locally. Security-critical source code, sepOS firmware, and the iBoot bootloader are published in plaintext.

This transparency architecture is notable among cloud AI providers. Most cloud inference systems offer privacy policies but not cryptographic verifiability. Apple's design allows anyone to confirm that Apple itself cannot access user requests, not merely by trusting Apple's claims but by inspecting the software.

Foundation Models Framework

Apple released the Foundation Models Framework for developers at WWDC 2025 in June of that year. The framework ships with iOS 26, iPadOS 26, macOS Tahoe, and visionOS 3. It gives third-party developers access to AFM-on-device through a Swift API.

Key capabilities

Guided generation: Developers annotate Swift types with the @Generable macro. The framework uses constrained decoding to guarantee that model output conforms to the specified type schema. A struct with an integer field and a string field will always produce valid integers and strings, not malformed JSON or hallucinated field names. This removes the fragile parsing step that typically accompanies LLM-generated structured data.

Tool calling: The model can invoke developer-defined tools as callbacks during generation. Both parallel tool invocation (where multiple tools can be called simultaneously) and serial chains (where one tool's output informs the next call) are handled automatically by the framework.

Stateful sessions: Multi-turn conversations maintain context across turns through session objects. Developers do not manage conversation history manually.

Custom adapters: Developers can train rank-32 LoRA adapters using Apple's Python toolkit and package them for distribution with their apps. These adapters load and unload at runtime the same way Apple's own task adapters do.

Offline operation: Because inference runs entirely on-device, apps using the framework work without a network connection, have no per-query cost, and do not require API keys.

As few as three lines of Swift code are needed for basic text generation. The framework integrates with Swift concurrency (async/await) and Combine.

System adapters

In addition to the base model access, Apple provides a set of system-level adapters for common tasks:

Adapter	Purpose
Summarization	Condense text to key points
Writing Tools	Rewrite, proofread, adjust tone
Smart Reply	Suggest message responses
Content tagging	Classify and tag content
Extraction	Pull structured data from unstructured text

Third-party apps using these adapters benefit from Apple's own fine-tuning data without needing to train their own models.

Developer adoption

By October 2025, dozens of apps in the App Store used the Foundation Models framework. Examples include:

Tasks (to-do app): automatic tag suggestions and voice-to-task conversion
Day One (journaling): entry highlights and writing prompts
MoneyCoach (finance): natural language spending queries and transaction categorization
LookUp (dictionary): contextual word usage examples and etymology generation
SmartGym (fitness): converting workout descriptions into structured training routines

Benchmarks

The following benchmarks come from Apple's 2024 technical report and 2025 tech report. Comparisons are against models that were publicly available at the time of each publication.

AFM-on-device vs comparable on-device models (2024 report)

Model	Params	MMLU (5-shot)	Notes
AFM-on-device	3B	61.4%	Apple
Phi-3-mini	3.8B	68.8%	Microsoft
Mistral-7B	7B	~64%	Mistral AI
Gemma-7B	7B	~64%	Google
Llama-3-8B	8B	~66%	Meta

Despite scoring lower than Phi-3-mini on MMLU in the raw pre-training evaluation, Apple's post-training human evaluations showed AFM-on-device preferred over Phi-3-mini 47.7% of the time vs 25% for Phi-3-mini (with the remainder tied), suggesting that task-specific alignment and adapter fine-tuning close much of the gap on practical use cases.

AFM-on-device vs comparable on-device models (2025 report)

Model	Params	MMLU	MGSM
AFM-on-device	3B	Higher	Lower
Qwen-2.5-3B	3B	Lower	--
Gemma-3-4B	4B	Lower	--
Gemma-3n-E4B	4B	Lower	Slightly higher

The 2025 on-device model outperforms Qwen-2.5-3B, Gemma-3-4B, and Gemma-3n-E4B on MMLU and multilingual benchmarks (MMMLU), though Gemma-3n-E4B edges it slightly on MGSM (multilingual math reasoning).

AFM-server vs comparable server models (2024 report)

Model	MMLU	Human eval win rate vs AFM-server
AFM-server	75.4%	baseline
DBRX-Instruct	~74%	AFM-server preferred
Mixtral-8x22B	~77%	comparable
GPT-3.5	~70%	AFM-server preferred
Llama-3-70B	~79%	comparable

The server model achieved over 50% win rate with 27.4% ties against GPT-3.5 in human evaluation of writing and summarization tasks. On Berkeley Function Calling Leaderboard for tool use, AFM-server achieved the best overall accuracy in the 2024 report, ahead of Gemini 1.5 Pro and GPT-4 at that evaluation date.

AFM-server vs comparable server models (2025 report)

Model	Comparison
LLaMA 4 Scout	AFM-server slightly behind
Qwen-3-235B	AFM-server behind
GPT-4o	AFM-server behind

The 2025 server model is positioned as competitive with models of comparable total and active parameter counts but trails significantly larger models.

Privacy and safety approach

Training data privacy

Apple does not use private user data or user interactions in training. Web crawl data is collected by Applebot with publishers able to opt out. Licensed data agreements with publishers cover specific content types. Synthetic data is generated by other models and filtered for quality. Personally identifiable information is removed from training corpora.

On-device inference privacy

When processing happens on-device, no data leaves the user's device at all. The model weights, adapters, and KV cache all reside in device memory. This covers the large majority of Apple Intelligence features in everyday use.

PCC privacy

When cloud inference is required, PCC provides the cryptographic and architectural guarantees described above. Apple's position is that the combination of on-device processing for the majority of requests and cryptographically verifiable stateless cloud processing for the remainder gives users stronger privacy guarantees than services that route everything through conventional cloud inference.

Content safety

Apple's responsible AI taxonomy for the foundation models covers 12 primary safety categories with 51 subcategories. More than 10% of post-training data addresses adversarial prompts or safety-relevant scenarios. The models scan inputs and outputs to detect content that should not be processed or surfaced.

Violation rates on adversarial benchmarks are lower than open-source and commercial models of comparable size. Apple reported that the summarization adapter did not amplify sensitive content in over 99% of targeted adversarial test cases.

Code execution capabilities are sandboxed in a locked Firecracker micro-VM environment. Red-teaming is conducted by both internal teams and external researchers under voluntary participation agreements.

Locale-specific safety evaluation covers culture-specific bias and sensitive content across all supported languages, with human red-teaming by annotators fluent in each language.

Comparison with other on-device AI systems

Aspect	AFM-on-device	Phi-3 mini	Gemma 3 4B	Llama 3.2 3B
Parameters	~3B	3.8B	4B	3B
Target platform	Apple Silicon	Mobile/cloud	Mobile/cloud	Mobile/cloud
Quantization	2-4 bit QAT	INT4	INT4	INT4
Context window	32K (extended)	4K	128K	128K
Developer API	Foundation Models (Swift)	Open weights	Open weights	Open weights
Privacy model	On-device + PCC	User-managed	User-managed	User-managed
Task adapters	Dynamic LoRA swap	Fine-tune manually	Fine-tune manually	Fine-tune manually
Offline operation	Yes	Yes (self-hosted)	Yes (self-hosted)	Yes (self-hosted)
Cost to developers	Free (on-device)	Varies by hosting	Varies by hosting	Varies by hosting

The primary distinction from open-weight models like Phi-3, Gemma 3, or Llama 3.2 is that AFM-on-device is not available as open weights. Developers access it only through Apple's API. This means developers cannot inspect the full model weights, fine-tune the base model (only adapters), or deploy it outside Apple hardware. In return, they get a pre-integrated, privacy-preserving, zero-cost inference runtime with guaranteed hardware acceleration on all supported Apple devices.

Use cases

AFM-on-device and AFM-server collectively power all Apple Intelligence features as of iOS 18.1 through iOS 26:

Writing Tools: Rewrite, proofread, summarize, and change tone in any text field across the system
Notification summaries: Condense notifications from apps into short digest lines
Smart Reply: Suggest short replies in Messages and Mail
Priority messages: Identify time-sensitive emails in Mail
Siri natural language understanding: More flexible phrasing for device control and app interactions
Personal context: Siri can answer questions about on-device content (photos, messages, calendar, notes)
Image Playground and Genmoji: Feature-specific models for image generation (separate from AFM)
Visual intelligence: Understanding images and screenshots (using the vision encoder)
Live Translation: Real-time translation across 16 languages (2025)
Third-party apps via Foundation Models Framework: Summarization, classification, extraction, guided generation in App Store apps

Limitations

AFM-on-device is constrained by the memory and compute budget of mobile hardware. At roughly 3 billion parameters and 3.7 bits per weight, the model weights occupy approximately 1.4 GB of device storage. This leaves limited headroom for longer contexts or more capable architectures without hardware advances.

Capability gaps compared to frontier models are real. The server model lags behind Qwen-3-235B and GPT-4o on general benchmarks. The on-device model, while competitive for its size class, cannot perform complex multi-step reasoning that larger models handle more reliably.

The model is not accessible as open weights. Developers cannot run it outside Apple hardware, audit the weights, or fine-tune beyond the adapter interface. This limits research use and independent safety auditing of the base model.

Apple Intelligence is not available on devices with A16 Bionic or older chips, excluding a large portion of the installed iPhone base. In mainland China, Apple Intelligence had not launched as of early 2026 due to regulatory requirements.

The on-device context window in production is limited to 4,096 tokens for most features, with extended 32,768-token contexts available for specific use cases. Inputs requiring longer context are routed to AFM-server, which introduces latency and requires network connectivity.

References

Apple Machine Learning Research. "Introducing Apple's On-Device and Server Foundation Models." machinelearning.apple.com, August 2024. https://machinelearning.apple.com/research/introducing-apple-foundation-models
Apple Machine Learning Research. "Apple Intelligence Foundation Language Models" (arXiv:2407.21075). July 2024. https://arxiv.org/abs/2407.21075
Apple Machine Learning Research. "Updates to Apple's On-Device and Server Foundation Language Models." machinelearning.apple.com, 2025. https://machinelearning.apple.com/research/apple-foundation-models-2025-updates
Apple Machine Learning Research. "Apple Intelligence Foundation Language Models: Tech Report 2025" (arXiv:2507.13575). July 2025. https://arxiv.org/abs/2507.13575
Apple Security Research. "Private Cloud Compute: A new frontier for AI privacy in the cloud." security.apple.com, June 2024. https://security.apple.com/blog/private-cloud-compute/
Apple Developer Documentation. "Foundation Models." developer.apple.com. https://developer.apple.com/documentation/FoundationModels
Apple Developer. "Meet the Foundation Models framework." WWDC25 Session 286. https://developer.apple.com/videos/play/wwdc2025/286/
Apple Developer. "Deep dive into the Foundation Models framework." WWDC25 Session 301. https://developer.apple.com/videos/play/wwdc2025/301/
Apple Newsroom. "Apple Intelligence is available today on iPhone, iPad, and Mac." October 2024. https://www.apple.com/newsroom/2024/10/apple-intelligence-is-available-today-on-iphone-ipad-and-mac/
Apple Newsroom. "Apple Intelligence expands to more languages and regions in April." February 2025. https://www.apple.com/newsroom/2025/02/apple-intelligence-expands-to-more-languages-and-regions-in-april/
Apple Newsroom. "Apple's Foundation Models framework unlocks new intelligent app experiences." September 2025. https://www.apple.com/newsroom/2025/09/apples-foundation-models-framework-unlocks-new-intelligent-app-experiences/
Apple/axlearn. AXLearn GitHub repository. https://github.com/apple/axlearn
TechCrunch. "How developers are using Apple's local AI models with iOS 26." October 2025. https://techcrunch.com/2025/10/03/how-developers-are-using-apples-local-ai-models-with-ios-26/
AppleInsider. "Understanding Apple's on-device and server foundation models." August 2024. https://appleinsider.com/articles/24/08/16/understanding-apples-on-device-and-server-foundation-models

Background

Hardware requirements

AFM-on-device architecture

KV cache sharing

Quantization

AFM-server architecture

PT-MoE design

Training

Pre-training

Pruning and distillation for AFM-on-device

Post-training

Task adapters

Apple Private Cloud Compute

Hardware

Security guarantees

Verifiable transparency

Foundation Models Framework

Key capabilities

System adapters

Developer adoption

Benchmarks

AFM-on-device vs comparable on-device models (2024 report)

AFM-on-device vs comparable on-device models (2025 report)

AFM-server vs comparable server models (2024 report)

AFM-server vs comparable server models (2025 report)

Privacy and safety approach

Training data privacy

On-device inference privacy

PCC privacy

Content safety

Comparison with other on-device AI systems

Use cases

Limitations

See also

References

Improve this article

Related Articles

DeepSeek 3.0

Phi-3

Phi-4

Gemma 2

Gemma 3

Phi-4-mini

Background

Hardware requirements

AFM-on-device architecture

KV cache sharing

Quantization

AFM-server architecture

PT-MoE design

Training

Pre-training

Pruning and distillation for AFM-on-device

Post-training

Task adapters

Apple Private Cloud Compute

Hardware

Security guarantees

Verifiable transparency

Foundation Models Framework

Key capabilities

System adapters

Developer adoption

Benchmarks

AFM-on-device vs comparable on-device models (2024 report)

AFM-on-device vs comparable on-device models (2025 report)

AFM-server vs comparable server models (2024 report)

AFM-server vs comparable server models (2025 report)

Privacy and safety approach

Training data privacy

On-device inference privacy

PCC privacy

Content safety

Comparison with other on-device AI systems

Use cases

Limitations

See also

References

Related Articles

DeepSeek 3.0