Trusted Execution Environments for Machine Learning
Trusted Execution Environments for machine learning (TEEs for ML, sometimes marketed as "Confidential AI" or "confidential inference") are deployments of hardware-isolated execution environments to run machine-learning training or inference workloads such that the data, model weights, and code are confidential from the cloud or hardware operator and integrity-attestable to the end user. A TEE provides cryptographically attested isolation enforced by silicon: memory encryption keeps secrets out of the hypervisor and other tenants, while remote attestation lets an external party verify that a genuine, unmodified TEE is executing a specific binary before releasing keys, data, or model weights to it.[^1][^2] The category combines CPU TEEs (Intel SGX, Intel TDX, AMD SEV-SNP, ARM TrustZone, ARM CCA, AWS Nitro Enclaves) with newer GPU TEEs introduced on the NVIDIA Hopper H100 and Blackwell B200 platforms, plus orchestration stacks (Confidential Containers, Edgeless Marblerun, Fortanix Confidential AI, Anjuna Seaglass) and bespoke implementations such as Apple Private Cloud Compute and Anthropic's Confidential Inference service.[^3][^4][^5][^6][^7]
The motivation is sharp: frontier LLM inference and fine-tuning increasingly involve sensitive enterprise data running against proprietary weights on third-party clouds, creating threat models in which the cloud provider, host kernel, co-tenant VMs, and even the model owner's own operators are all potential adversaries. TEEs collapse that trust boundary to the silicon and a small attestable software base, providing a middle ground between leaving data plaintext on a hyperscaler and the still-impractical performance overheads of fully homomorphic encryption.[^8] As of May 2026, confidential GPU offerings backed by NVIDIA H100, H200 and Blackwell B200 are generally available across Microsoft Azure, Google Cloud, and several specialty providers, and Anthropic, Apple, and Fortanix have all announced production TEE-based inference systems.[^4][^5][^6][^9]
Background
Origins of confidential computing
The general idea of a hardware-protected enclave predates ML by decades. ARM TrustZone, shipped in Cortex-A processors from the mid-2000s, partitions a system into "Secure World" and "Normal World," with every bus transaction tagged by an NS bit and transitions mediated by the Secure Monitor Call (SMC) instruction.[^10] TrustZone enabled mobile DRM, secure boot, and trusted application stacks such as OP-TEE, but its single-secure-world model is not naturally suited to multi-tenant cloud workloads.
Intel introduced Software Guard Extensions (SGX) in 2015 with Skylake processors. SGX defines user-mode "enclaves," ring-3 regions of encrypted memory whose contents are decrypted only inside the CPU package and are opaque to the operating system, hypervisor, and System Management Mode firmware.[^11] SGX shipped with a measurement-and-attestation protocol (MRENCLAVE, MRSIGNER, and quoting enclaves issuing reports signed under Intel's EPID or DCAP infrastructure), which made it the first commodity hardware to support practical remote attestation of arbitrary code.[^11] AMD followed in 2016 with Secure Encrypted Virtualization (SEV) on EPYC, encrypting guest VM memory under a per-VM key managed by the AMD Secure Processor. SEV-ES (Encrypted State, 2017) added register-state encryption, and SEV-SNP (Secure Nested Paging, 2020) added integrity protection against replay, remapping, and aliasing by the malicious hypervisor.[^12]
The Confidential Computing Consortium was established under the Linux Foundation on 17 October 2019 with founding premier members Alibaba, ARM, Google Cloud, Huawei, Intel, Microsoft, and Red Hat, and general members including Baidu, ByteDance, Fortanix, Oasis Labs, Swisscom, Tencent, and VMware.[^13] It defines confidential computing as "the protection of data in use by performing computation in a hardware-based, attested trusted execution environment."[^1]
From per-process enclaves to confidential VMs
SGX's small enclave-page-cache (originally 128 MB) and the cost of context-switching across the enclave boundary made it awkward to run large applications, and an unending stream of microarchitectural attacks (Foreshadow / L1TF, ZombieLoad, LVI, SGAxe, ÆPIC Leak, SGX.fail) eroded user confidence.[^14][^15] Intel responded with Trust Domain Extensions (TDX), a hypervisor-grade TEE that wraps an entire guest VM as a "trust domain" with encrypted memory and protected register state, and which became generally available with 5th-generation Xeon Scalable processors (Emerald Rapids) on 14 December 2023.[^16][^17] In parallel, Intel announced that SGX was deprecated on 11th- and 12th-generation Core client CPUs starting in 2021, while keeping it active on the Xeon server line.[^18]
ARM introduced the Realm Management Extension (RME) in the Armv9-A architecture, the principal hardware feature underlying ARM Confidential Compute Architecture (CCA). CCA adds a fourth "Realm World" alongside Normal, Secure, and Root worlds, allowing lower-privileged software to protect itself even from a malicious hypervisor.[^19][^20]
AWS shipped Nitro Enclaves in October 2020 as an EC2 feature that carves an isolated, network-free, storage-free child VM out of a parent instance, communicating only over a vsock channel and attestable through a Nitro hypervisor signing service. Nitro Enclaves does not rely on Intel SGX or AMD SEV but instead on the Nitro System's own hypervisor isolation.[^21]
From CPU enclaves to GPU TEEs
Until 2022, confidential computing was effectively CPU-only, and any workload that touched a GPU left the trust boundary the moment data crossed PCIe. NVIDIA changed that with H100 (Hopper, GA March 2023), the first GPU with a hardware Trusted Execution Environment. H100 boots into a "Confidential Computing" mode under control of an on-die security processor, runs a measured firmware, encrypts and signs all command buffers and CUDA kernels that cross the PCIe bus, and produces a GPU attestation report signed by a per-chip key rooted in NVIDIA's PKI.[^3][^22] Data transit between CPU and GPU uses an encrypted "bounce buffer" in shared memory, since standard PCIe is not natively encrypted.[^22]
NVIDIA Blackwell (announced GTC March 2024) extended this with the first TEE-I/O capable GPU, providing inline link-level encryption over NVLink and NVSwitch, eliminating the bounce-buffer overhead and bringing multi-GPU confidential workloads under the same attested envelope.[^4] AMD's MI300X takes a different approach: rather than a GPU-internal TEE, the GPU is admitted into the host's SEV-SNP envelope through Trusted I/O (TDISP/SEV-TIO), with attestation produced at the VM level.[^23]
Technical details
TEE primitives
Every TEE provides three core primitives:
- Sealed memory. Pages belonging to the TEE are encrypted by hardware (commonly AES-XTS or AES-GCM with a per-VM or per-enclave key) under a key that never leaves the silicon. The CPU enforces that loads and stores from outside the TEE either fault or read ciphertext.
- Remote attestation. The hardware measures the initial state of the TEE (firmware version, loaded image hash, configuration) and signs a report with a hardware-attestable key. A verifier checks the signature against a vendor PKI, validates that the measurement matches the expected workload, and only then provisions secrets or data.[^24]
- Secure I/O. Because most useful workloads touch storage, network, or accelerators, modern TEEs add mechanisms for encrypted DMA, vsock-style local channels, or PCIe link encryption so that data exiting the encrypted memory region remains protected end to end.
CPU TEEs
| TEE | Vendor | Granularity | Year first GA | Threat model | Notable attacks |
|---|
| TrustZone | ARM | Whole device (Secure World) | ~2005 | Untrusted Normal World OS | Various TA-specific |
| SGX | Intel | Per-process enclave | 2015 | Untrusted OS / hypervisor | Foreshadow, LVI, ÆPIC, SGX.fail |
| SEV / SEV-ES / SEV-SNP | AMD | Whole guest VM | 2016 / 2017 / 2020 | Untrusted hypervisor | SEV-step, WeSee, BadRAM |
| TDX | Intel | Whole guest VM (trust domain) | 2023 (5th-gen Xeon) | Untrusted hypervisor, SMM | TDX-Down (mitigated) |
| CCA / RME | ARM | Realm VM | 2024 (Neoverse V3 / N3) | Untrusted hypervisor | Research evolving |
| Nitro Enclaves | AWS | Child VM | 2020 | Untrusted parent instance | Parent-side oracle attacks |
SEV-SNP attestation is grounded in the Versioned Chip Endorsement Key (VCEK), a per-chip key derived from a unique secret and the reported TCB versions (SP bootloader SVN, SP OS SVN, SNP firmware SVN, microcode patch level). VCEK certificates chain to the AMD SEV CA and an AMD Root Key, allowing a remote verifier to confirm that the report was produced by a specific genuine EPYC at a specific patch level.[^25] Intel TDX uses an analogous DCAP-style quoting infrastructure with provisioning certification keys signed by Intel.
GPU TEEs
NVIDIA's H100 supports two relevant modes. In Single GPU Passthrough Confidential Computing (SPT-CC) a single H100 is passed through to one Confidential VM, the GPU boots a measured firmware, and CPU-GPU data crosses an encrypted bounce buffer in shared system memory. All command buffers and CUDA kernels are encrypted and signed before crossing PCIe.[^22][^26] In Multi-Instance GPU CC (MIG-CC) the GPU's MIG partitions each get their own confidential context. The GPU attestation report is signed by the on-die security processor and includes a measurement of the booted GPU firmware and the VBIOS.[^3]
Performance studies show that H100 CC mode imposes modest overhead. Zhu et al. (2024, arXiv:2409.03992) benchmark Llama-2 / Llama-3 inference on H100 and find total overhead below ~7% for typical LLM queries, with overhead approaching zero on Llama-3.1-70B and other large models because compute dominates I/O.[^26] Time to first token (TTFT) carries larger relative overhead than inter-token latency, confirming that the bottleneck is encrypted PCIe traffic, not in-GPU computation.[^26] Subsequent work by Corvex on HGX B200 reports near-identical throughput to non-CC mode because Blackwell encrypts NVLink and NVSwitch inline rather than going through a bounce buffer.[^4]
Attestation flow
A canonical confidential-inference attestation flow works as follows:
- The cloud control plane launches a confidential VM (SEV-SNP or TDX) with a measured guest image and a pass-through GPU in CC mode.
- At guest boot, the VM's vTPM (a virtual TPM rooted in the platform) extends Platform Configuration Registers (PCRs) with each loaded component.
- The guest requests an attestation report from the SEV-SNP / TDX hardware (containing a measurement of the launch state and a nonce supplied by the verifier) and a GPU attestation report from the GPU's security processor.
- The verifier (Azure Attestation, AWS NCC, Google Confidential Space verifier, or a customer-controlled service) checks both signatures against vendor PKIs, compares measurements to a known-good policy, and on success releases a workload key from a key-management system (KMS) into the VM.
- The VM uses the key to decrypt model weights, fetch encrypted user data, run inference, and return ciphertext.
This pattern is implemented in production by Azure Confidential VM Guest Attestation,[^27] Google Confidential Space,[^28] AWS KMS Attestation for Nitro Enclaves,[^21] Apple Private Cloud Compute,[^5] and the Anthropic Confidential Inference architecture.[^6]
Implementations and adoption
Cloud provider offerings
Microsoft Azure offers AMD SEV-SNP confidential VMs on the DCasv5 / ECasv5 family (3rd-gen AMD EPYC) and Intel TDX confidential VMs on the DCesv5 / ECesv5 family (5th-gen Xeon), both with guest-attestation hooks against Microsoft Azure Attestation and Azure Key Vault.[^29] Azure announced general availability of Intel TDX confidential VMs in early 2026 and provides NVIDIA H100 Confidential Computing through the NCC H100 v5 series under AMD SEV-SNP, plus an Intel TDX + H100 / H200 stack as a regional option.[^29]
Google Cloud runs Confidential VMs on AMD SEV, SEV-SNP, and Intel TDX, layered with Confidential Space, a hardened Container-Optimized OS image that runs a workload container inside a confidential VM and integrates with Workload Identity Federation so secrets are only released to attested workloads. Confidential Space reached general availability in 2023 and added Intel TDX with NVIDIA Confidential Computing in preview in 2024.[^30][^28]
Amazon Web Services offers SEV-SNP on M7a and other 4th-gen EPYC instance types,[^31] alongside Nitro Enclaves on most EC2 instances. Nitro Enclaves are widely used for cryptographic key handling and have been adopted by Anjuna for its Seaglass platform.[^32] AWS does not currently expose Intel TDX in EC2.
ML-specific stacks
Apple Private Cloud Compute (PCC), announced 10 June 2024 alongside Apple Intelligence, runs LLM inference for Apple Intelligence on custom Apple-silicon servers. The published architecture has five requirements: stateless computation, enforceable guarantees, no privileged runtime access, non-targetability, and verifiable transparency, the last of which is implemented by publishing every production PCC build image and inviting researchers to verify that the running software matches the published source.[^5] PCC uses Apple's Secure Enclave technology and a hardened iOS/macOS subset; it is one of the largest-scale TEE-based ML deployments outside of cloud-provider primitives.
Anthropic Confidential Inference via Trusted Virtual Machines, described in a paper by Anthropic and Pattern Labs (now Irregular) published 18 June 2025, runs inference for selected Claude customers inside SEV-SNP / TDX confidential VMs with NVIDIA H100 / H200 GPUs in CC mode. The design isolates a minimal "model loader" inside the TEE that decrypts weights only after attestation, signs releases via a CI server, and pairs the inference VM with an API server that handles user-data decryption. The paper explicitly maps its design against RAND's Security Level 4 (SL4) and Security Level 5 (SL5) weight-protection tiers for frontier-model security.[^6]
Fortanix Confidential AI wraps NVIDIA Confidential Computing GPUs (H100, H200, Blackwell) with a key-management and attestation control plane targeted at enterprise model serving; in October 2025 Fortanix announced a joint solution with NVIDIA for "agentic AI" on confidential GPUs, and in March 2026 announced a Confidential AI release covering both inference and model-IP protection in enterprise AI factories.[^7]
Edgeless Marblerun is an open-source service mesh for confidential workloads, originally built for Intel SGX enclaves and now extended to Intel TDX and AMD SEV-SNP via the related Contrast project. Marblerun supplies one cluster-wide attestation per manifest, plus mTLS provisioning and secret distribution to "Marbles" running inside enclaves built with EGo, Gramine, or Occlum.[^33]
Anjuna Seaglass is a commercial platform that lifts unmodified Linux applications into Intel SGX/TDX, AMD SEV-SNP, or AWS Nitro Enclaves without code changes, and is one of the more prominent vendors integrating AWS Nitro Enclaves with Kubernetes-style deployments.[^32]
Confidential Containers (CoCo) is a CNCF sandbox project that runs each Kubernetes pod inside its own TEE using a Kata Containers microVM as the carrier, supporting Intel TDX, AMD SEV-ES/SNP, and IBM Secure Execution. CoCo's in-guest components (Kata Agent, Attestation Agent, Confidential Data Hub) handle attestation, image decryption, and key release, dramatically shrinking the Trusted Compute Base relative to a normal Kubernetes node.[^34]
Comparative summary
| System | TEE technology | Workload | Status (2026-05) |
|---|
| Apple Private Cloud Compute | Apple-silicon Secure Enclave | Apple Intelligence inference | Production since iOS 18 (2024) |
| Anthropic Confidential Inference | SEV-SNP / TDX + H100/H200 CC | Selected Claude inference | Architecture published Jun 2025 |
| Azure Confidential GPU | SEV-SNP + H100 CC | Customer AI workloads | General availability |
| Google Confidential Space + CC GPU | TDX + H100 CC | Customer AI workloads | TDX+H100 in preview |
| AWS Nitro Enclaves | Nitro hypervisor | Generic confidential workloads | GA since 2020 |
| Fortanix Confidential AI | NVIDIA CC GPUs + KMS | Enterprise model IP | GA |
| Edgeless Marblerun / Contrast | SGX, TDX, SEV-SNP | Service mesh for confidential apps | Open source |
| Confidential Containers | Kata + TDX / SEV-ES / SE | Confidential Kubernetes | CNCF sandbox |
Applications
Confidential inference
The most mature application is "confidential inference": a customer sends prompts to a model deployed inside a TEE, with hardware attestation guaranteeing that neither the model provider nor the cloud operator can see plaintext requests or outputs. This is the architecture used in Apple PCC, Anthropic Confidential Inference, and Azure / Google / Fortanix offerings.[^5][^6][^7] Recent benchmark work shows that H100 CC adds only a few percent of latency to typical LLM queries, making it economically tractable for production deployment.[^26]
Model-IP protection
A symmetric concern is protecting proprietary model weights from the customers running them. If a model owner pushes weights to an enterprise cluster, even legitimate enterprise admins can in principle exfiltrate the weights. TEE-based deployment lets the model owner encrypt weights with a key that is only released after the customer's hardware attests to running the agreed-on inference binary, blocking trivial extraction. Anthropic explicitly cites RAND's SL4 / SL5 weight-protection tiers as the bar its confidential inference design aims to meet.[^6]
Confidential training and federated learning
Several research and industry deployments use TEEs to support privacy-preserving collaborative training: data from multiple parties is uploaded encrypted to a TEE, the TEE attests to an agreed training recipe, decrypts the data only inside the encrypted boundary, and outputs only a trained model (or its gradients). This is positioned as a counterpart to federated learning approaches and is implemented in production by Google Confidential Space's "joint data analysis and ML training" pattern.[^28]
Mitigating data-side privacy attacks
TEEs do not directly defend against statistical privacy leaks from a model's outputs (e.g. membership inference or training-data extraction), but they complement those defenses by ensuring that the attacker cannot bypass the model's API by directly reading server memory. Combined with differential privacy noise on training, TEEs are the standard hardware leg of a defense-in-depth privacy stack.
Compute provenance and AI governance
A less-explored but rapidly emerging application is using attested TEEs as a substrate for AI compute governance: regulators or evaluators can require that frontier training or inference happen inside attestable hardware, so that they can later verify a workload's identity and policy compliance without trusting the operator's claims. Anthropic's Confidential Inference paper notes this as a motivation: confidential computing provides a cryptographic chain of trust that "attests to software security and enforces rules about which software is allowed to use encryption keys."[^6]
Limitations
Performance overhead has fallen rapidly but is not free. For LLM inference on H100 CC mode, published benchmarks show 4 to 8 percent throughput loss at small batches, falling to under 1 percent for large models or long sequences, because the encrypted bounce buffer is amortized across longer compute.[^26] Training workloads, which are more I/O-heavy and rely on collective operations across many GPUs, historically suffered worse penalties under H100 because each NVLink crossing exited the TEE. NVIDIA Blackwell's TEE-I/O extends encryption across NVLink and NVSwitch, eliminating the bounce-buffer cost and reportedly delivering near-identical throughput to non-CC mode.[^4]
For SGX-style per-process enclaves, classical applications saw 30 to 200 percent overheads depending on enclave page misses, and the original 128 MB enclave page cache forced costly EPC paging on larger working sets. These limitations are part of why the industry has shifted toward VM-grade TEEs (TDX, SEV-SNP) for ML workloads.
Side-channel and microarchitectural attacks
Hardware TEEs are subject to an active research front of side-channel and microarchitectural attacks. Notable examples include:
- Foreshadow / L1TF (2018): speculative execution leak that read SGX enclave memory via L1 cache.
- LVI (Load Value Injection, 2020): a "reverse Meltdown" that injects attacker-controlled values into transiently executed loads, bypassing earlier Spectre / Meltdown mitigations and requiring expensive compiler-side fences in SGX code.[^14]
- ÆPIC Leak (CVE-2022-21233, 2022): the first architectural (non-side-channel) SGX leak, in which the APIC MMIO undefined range on 10th-12th-generation Intel CPUs returned stale cache-line data including enclave secrets.[^15]
- SGX.fail (2023): a catalogue of SGX defects.
- SpecHammer (IEEE S&P 2022): combined Rowhammer bit flips with Spectre to relax the requirements for a Spectre v1 gadget, demonstrating a class of attacks that crosses microarchitectural and DRAM-level boundaries.[^35]
- BadRAM (CVE-2024-21944, 2024): a $10 hardware attack against AMD SEV-SNP that modifies SPD metadata in DDR4/DDR5 DIMMs to alias memory regions, breaking the cryptographic attestation guarantees of SEV-SNP. AMD issued firmware updates to validate memory topology at boot.[^36]
- WeSee (2024): an arXiv paper demonstrating malicious #VC interrupts to break SEV-SNP.
These attacks reinforce that a TEE's security is not a single binary property but a function of the specific microcode and firmware versions running on a specific physical chip, and that vendors and customers must aggressively patch and re-attest.
Trust in vendor PKIs
A TEE's attestation guarantees are only as strong as the vendor PKI rooted at Intel, AMD, ARM, or NVIDIA. A compromise of a vendor signing key or certificate authority would invalidate trust in every chip whose attestation chain depends on it. Customers cannot independently audit vendor key-management practices, which means the threat model "vs. supply-chain attacker" is only partially addressed. Apple's PCC mitigates one corner of this by publishing every production build for independent verification.[^5]
Threat model gaps
TEEs do not protect against:
- Workload bugs. A vulnerable model server inside the TEE can still leak data through its API, e.g. via prompt-injection or membership inference attacks against outputs.
- Side channels in shared resources. Even with encrypted memory, contention on caches, branch predictors, DRAM banks, or PCIe links can leak information across the boundary.
- Physical attackers with arbitrary capability. SEV-SNP and TDX assume that the attacker cannot do full chip decapsulation or sophisticated bus probing. BadRAM showed how cheap DRAM-interposer attacks can bypass that assumption.[^36]
- Statistical model leakage. Data-poisoning attacks during training, model-extraction attacks from query traffic, and training-data memorization are orthogonal to TEE protections.
Comparison to FHE, MPC, and federated learning
TEEs occupy a specific point in the privacy-preserving ML design space. The standard comparison axes are confidentiality, performance, and threat model:
| Approach | Confidentiality | Performance | Threat model |
|---|
| TEE / confidential computing | Data and weights in plaintext only inside hardware-isolated memory | Single-digit-percent overhead for LLM inference on Blackwell | Trusts CPU / GPU silicon and vendor PKI; vulnerable to side channels |
| Fully homomorphic encryption (FHE) | Computation on ciphertext; server never sees plaintext | 10^3 to 10^6 overhead; impractical for full LLM inference today | Trusts only the math; no silicon trust required |
| Secure multi-party computation (MPC) | Inputs secret-shared across non-colluding parties | 10x-1000x overhead; collaborative round-trips | Trusts that a threshold of participants is honest |
| Federated learning | Raw data stays on device; only gradients/updates are shared | Comparable to centralized training | Does not by itself prevent leakage from shared gradients; often combined with DP |
TEE-based solutions avoid the data exchange and cryptographic overhead of MPC and FHE, but require trust in the silicon vendor. Federated learning sidesteps centralization but leaks information through shared gradients unless paired with differential privacy. In practice, production deployments combine multiple techniques: e.g. Apple PCC pairs TEEs with on-device computation and differential privacy, and Anthropic's design pairs TEE attestation with strict code-review pipelines.[^5][^6]
The wider landscape of secrecy-preserving ML overlaps with several adjacent topics that have their own dedicated articles. Homomorphic encryption for machine learning explores ciphertext-native computation as a software-only alternative to hardware TEEs. Federated learning and differential privacy occupy adjacent positions in the privacy-preserving ML stack. The threat models that TEEs aim to mitigate include model extraction attacks, model stealing generally, and membership inference attacks on outputs. Data poisoning sits orthogonal to TEEs since it targets the training pipeline rather than the runtime confidentiality boundary.
GPU-hardware context is provided by the dedicated articles on NVIDIA H100, NVIDIA H200, NVIDIA Blackwell, NVIDIA Blackwell B200, NVIDIA Hopper, and AMD Instinct MI300X. Cloud and product context is covered by Apple Intelligence (the workload behind Private Cloud Compute) and Anthropic (whose Confidential Inference design extends the Claude API).
See also
References