Homomorphic encryption for machine learning

Homomorphic encryption for machine learning is the application of fully, somewhat, or leveled homomorphic encryption (FHE, SHE, LHE) so that a server can run machine learning computations, almost always inference and occasionally training, on ciphertexts without ever decrypting the underlying data. The technique was made possible by Craig Gentry's 2009 Stanford dissertation, which presented the first plausible fully homomorphic encryption scheme.[^1] It became practically interesting for neural networks with the CryptoNets paper at ICML 2016, which evaluated a small convolutional neural network on encrypted MNIST images.[^2] Since then, schemes such as BFV, BGV, CKKS, and TFHE have been used to build encrypted versions of linear regression, logistic regression, tree ensembles, CNNs, and most recently transformer inference at the scale of BERT and GPT-2.[^3][^4][^5] Implementations are dominated by a handful of libraries, including Microsoft SEAL, OpenFHE, IBM HELayers, Zama's Concrete and Concrete ML, and Intel HEXL, with parameters anchored by the HomomorphicEncryption.org community standard first released in November 2018.[^6][^7][^8][^9][^10][^11]

Background

Homomorphic encryption before deep learning

The notion of "privacy homomorphisms" goes back to Rivest, Adleman and Dertouzos in 1978, but for three decades no scheme supported both unbounded additions and unbounded multiplications on ciphertexts. That gap was closed in 2009, when Craig Gentry's Stanford PhD dissertation, "A Fully Homomorphic Encryption Scheme," gave the first construction of an FHE scheme, based on ideal lattices, together with a "bootstrapping" procedure that homomorphically evaluates the decryption circuit to refresh noisy ciphertexts.[^1] A preliminary version appeared at STOC 2009 as "Fully Homomorphic Encryption Using Ideal Lattices."[^1][^12]

Gentry's blueprint had three parts: build a somewhat homomorphic scheme that supports a bounded multiplicative depth, "squash" its decryption circuit to a low degree polynomial in ciphertext and secret-key bits, and then bootstrap by homomorphically evaluating that squashed decryption circuit on an encryption of the secret key.[^12] The blueprint is conceptually elegant but extraordinarily slow in its original form, with bootstrapping operations measured in many minutes per gate.

A wave of follow-up schemes replaced ideal lattices with the more standard Learning With Errors (LWE) and Ring-LWE assumptions and dropped the squashing step, yielding faster and more secure constructions. Brakerski, Gentry and Vaikuntanathan introduced the BGV scheme in 2011 and 2012, proving that leveled FHE could be obtained without bootstrapping by carefully managing noise growth and using "modulus switching."[^13] Fan and Vercauteren proposed BFV in 2012 as a related Ring-LWE scheme using a different noise representation; together BGV and BFV are the canonical "exact integer arithmetic" FHE schemes used today.[^7][^14]

Two further developments matter directly for machine learning. First, in 2016, Chillotti, Gama, Georgieva and Izabachène introduced TFHE, which reduced gate-level bootstrapping to roughly 13 milliseconds on a single core, a fifty-fold improvement over previous work and the basis for fast Boolean and small-integer FHE.[^4] Second, in 2017, Cheon, Kim, Kim and Song introduced CKKS at ASIACRYPT 2017, "Homomorphic Encryption for Arithmetic of Approximate Numbers."[^3] CKKS treats ciphertexts as encrypting fixed-point approximations of real or complex numbers and supports a "rescale" operation that keeps plaintext magnitude bounded, which is exactly what is needed for arithmetic over neural network weights and activations.[^3]

From cryptography to ML

Machine learning became a practical target for FHE only after these scheme improvements. Microsoft Research's CryptoNets paper, presented at ICML 2016 by Gilad-Bachrach, Dowlin, Laine, Lauter, Naehrig and Wernsing, demonstrated that a small CNN could be evaluated on encrypted MNIST digits with 99 percent accuracy and a throughput of about 59,000 predictions per hour on a single PC, with a latency of 250 seconds per batch.[^2] CryptoNets replaced the usual nonlinear activation function with a square function so that the entire network reduced to a low-depth arithmetic circuit, which the underlying leveled FHE scheme could evaluate without bootstrapping.[^2]

That paper triggered a sustained research program on private inference, and later on private fine-tuning and training, with two persistent themes: how to handle non-polynomial operations in modern networks, and how to keep ciphertext size, multiplicative depth, and runtime within practical budgets.

How it works

Schemes used in ML

Most ML-oriented work uses one or more of the following Ring-LWE-based schemes.

Scheme	First proposed	Plaintext type	Typical ML role	Reference
BGV	Brakerski, Gentry, Vaikuntanathan, 2011 to 2012	Exact integers mod p	Integer arithmetic, tree models	[^13]
BFV	Fan, Vercauteren, 2012 (formalized by Brakerski 2012)	Exact integers mod t	Linear/logistic models, BERT-style MatMul	[^7][^14]
CKKS	Cheon, Kim, Kim, Song, ASIACRYPT 2017	Approximate real or complex vectors	CNNs, transformers, polynomial activations	[^3]
TFHE / CGGI	Chillotti, Gama, Georgieva, Izabachène, ASIACRYPT 2016	Bits and small integers on the torus	Programmable bootstrapping, lookup tables, gates	[^4]

BGV and BFV operate on vectors of integers modulo a plaintext modulus and are appropriate for models that can be quantized to fixed precision. CKKS is the workhorse for deep learning inference because it natively packs many real-valued slots into a single ciphertext, supports a rescaling step that mirrors fixed-point arithmetic, and admits an approximate bootstrapping procedure that scales to large multiplicative depths.[^3][^7] TFHE encrypts data on the real torus modulo one, supports a "programmable bootstrapping" that evaluates a univariate lookup table during noise refresh, and is therefore well suited to non-polynomial functions and to small-domain ML primitives.[^4]

What makes ML hard for FHE

Three properties of modern neural networks interact badly with FHE.

The first is non-linearities. FHE schemes natively support addition and multiplication, but not ReLU, GELU, softmax, sigmoid, max-pooling, or layer normalization in general. CryptoNets sidestepped this by replacing ReLU with a square function so that the network is exactly a low-degree polynomial.[^2] Since 2020, the dominant alternative has been to approximate ReLU and other activations by polynomials of carefully chosen degree, often with iterative composite-polynomial constructions for the sign function and for softmax used in attention.[^15]

The second is multiplicative depth. Each homomorphic multiplication consumes noise budget and, in BGV-style schemes, levels of a "modulus ladder." The number of sequential multiplications a ciphertext can absorb before becoming undecryptable is called its multiplicative depth, and the computational cost of every operation increases quickly with the depth budget chosen at setup.[^16] Deep networks with many layers and high-degree polynomial activations therefore force a tradeoff: either choose large parameters that allow more depth at the price of larger and slower ciphertexts, or insert bootstrapping operations that refresh the noise at significant cost.

The third is ciphertext expansion and ciphertext size. A single CKKS or BFV ciphertext stores two ring elements of degree typically between 2^13 and 2^17 with large coefficient moduli, so individual ciphertexts can be in the tens of megabytes and the encrypted version of a model state can run several orders of magnitude larger than its plaintext. Operations such as relinearization, "modUp" and rotation produce intermediate expansion that further dominates runtime.[^16] Hardware-friendly packing schemes such as SIMD slots in BGV/BFV/CKKS and tile tensors in HELayers are designed to amortize these costs over many parallel input samples.[^17]

Threat model and operational pipeline

The standard threat model for FHE-protected inference is a server that is honest but curious: the server runs the model correctly but should learn nothing about user inputs beyond what the cleartext output reveals. A typical pipeline has the client encrypt the input under its public key, send the ciphertext to the server, have the server execute the network homomorphically using public encryption parameters and evaluation keys (relinearization and rotation keys for CKKS, bootstrapping keys for TFHE), and return ciphertexts that the client decrypts.[^7][^8] Because FHE schemes are based on Ring-LWE, they are believed to be secure against quantum adversaries under the same lattice hardness assumptions used in post-quantum cryptography.[^18]

Schemes and milestones

CryptoNets and the square-activation era

CryptoNets (2016) used Microsoft's YASHE-derived scheme and a network with two convolutional blocks, square activations, a fully connected layer, another square activation, and a final scaled-mean-pool output, all evaluated as a single arithmetic circuit on encrypted MNIST digits.[^2] The crucial cost insight was that the entire model had multiplicative depth small enough to evaluate without bootstrapping, which at the time was prohibitively expensive. Subsequent variants such as Faster CryptoNets, Lola and other improvements reduced batch latencies but kept the same square-activation polynomial-network template.[^2]

CKKS for approximate-arithmetic neural networks

CKKS shifted the design space. Its native support for vectors of real numbers and its rescaling operator made it the default scheme for floating-point-style neural network inference.[^3] Cheon's group at Seoul National University released HEAAN as an open-source reference implementation of CKKS shortly after the paper, and CryptoLab, founded by Cheon, continues to develop a commercial successor.[^19] Improvements to CKKS bootstrapping, especially "approximate bootstrapping" with composite polynomial approximations of the modular reduction, have made it feasible to evaluate deep networks, including ResNet-style classifiers, entirely under CKKS without re-encrypting between layers.[^15][^20]

CKKS-based inference of standard image classifiers is now a routine benchmark. For example, IBM's HELayers framework demonstrates an HE-friendly AlexNet that runs in roughly three minutes per inference, several orders of magnitude faster than earlier all-FHE approaches.[^17]

Transformer inference under FHE

Transformer inference is fundamentally harder than CNN inference under FHE because of softmax, GELU and layer normalization, all of which involve division, square roots and exponentials. Three lines of work have made progress.

THE-X, presented at the Findings of ACL 2022 by Chen and colleagues from Microsoft, replaces GELU with a polynomial, decomposes softmax into HE-friendly subcomponents and substitutes layer normalization with a precomputed normalization, enabling fully encrypted inference for BERT-style models with negligible accuracy drop on downstream tasks.[^5] Iron, published at NeurIPS 2022 by Hao and collaborators, uses the BFV scheme together with secret-sharing-based subprotocols, packing plaintext matrices into polynomials so that polynomial multiplication corresponds to matrix multiplication; the system gives new protocols for softmax, GELU and layer normalization suitable for transformer client-server private inference.[^21]

NEXUS, released in 2024 by Zhang and collaborators at Zhejiang University as Cryptology ePrint 2024/136, is a non-interactive secure transformer inference protocol built on RNS-CKKS, with novel primitives for attention, softmax, layer normalization and argmax. NEXUS performs CKKS inference on a BERT-base model with 128 input tokens in approximately 1,103 seconds and ships both CPU and CUDA-accelerated implementations, with the GPU codebase modifying Microsoft SEAL 4.1 to add bootstrapping support.[^22]

Outside academia, Zama published an "encrypted LLM" blog post and reference implementation in 2023 showing how to run GPT-2 inference on encrypted prompts using its Concrete and Concrete ML libraries, by reworking the multi-head attention layers of the Hugging Face GPT-2 implementation into FHE-friendly building blocks built on the TFHE-based Concrete compiler.[^23] Later Concrete ML releases extended this to transformer fine-tuning and to other architectures, with v1.7 documented in 2024 covering transformer fine-tuning workflows.[^24]

Other relevant schemes and variants

The TFHE scheme remains the basis for "programmable bootstrapping" approaches in which a univariate lookup table is evaluated as part of every noise refresh. This is particularly useful for piecewise activations and quantized integer models, and is the cryptographic core of Zama's Concrete library, an open-source compiler from Python or MLIR-style IR to TFHE programs.[^25] Hybrid scheme-switching constructions such as CHIMERA, introduced by Boura, Gama and Georgieva in 2018, combine TFHE, BFV and HEAAN by mapping their plaintext spaces to a common algebraic structure and switching ciphertexts at runtime so that linear and non-linear pieces of a computation each use the most efficient scheme.[^26]

Implementations

The FHE library ecosystem has consolidated into a small set of widely deployed implementations, most of which interoperate with each other through shared scheme definitions and through Intel HEXL acceleration.

Microsoft SEAL

Microsoft SEAL is an open-source, MIT-licensed C++ library developed by the Cryptography Research Group at Microsoft Research and is one of the most widely used FHE libraries.[^8] It implements BFV and BGV for exact integer arithmetic and CKKS for approximate arithmetic on real or complex vectors.[^8] Version 4.3.2 of SEAL was released on May 1, 2026, and the library targets Windows, Linux, macOS, Android, iOS and WebAssembly.[^8] SEAL underpins many neural network research prototypes, including the original CryptoNets implementation, NEXUS, and many CKKS-based transformer projects.[^2][^22]

OpenFHE

OpenFHE is the open-source community successor to PALISADE and integrates implementations and design ideas from the HElib, HEAAN and FHEW projects.[^9] It supports BGV, BFV, CKKS, DM/FHEW and CGGI/TFHE, includes approximate CKKS bootstrapping and complies with the HomomorphicEncryption.org security standard.[^9] OpenFHE is community-driven with support from ARPA-H, DARPA and NumFocus, and the current development version is 1.5.1, released on April 10, 2026.[^9]

Zama Concrete and Concrete ML

Concrete is Zama's TFHE-based FHE compiler, originally described in "CONCRETE: Concrete Operates oN Ciphertexts Rapidly by Extending TFHE" by Chillotti, Joye, Ligier, Orfila and Tap at WAHC 2020.[^25] Concrete ML, a privacy-preserving machine learning framework built on top of Concrete, lets data scientists convert scikit-learn-style models and PyTorch or Keras networks via ONNX to TFHE equivalents without writing cryptographic code.[^27] Concrete ML version 1.9.0 was released in April 2025 and supports built-in linear models, tree models including XGBoost, small neural networks and custom PyTorch and Keras models imported through ONNX, with quantization-aware training to reduce accuracy loss under FHE constraints.[^27]

IBM HELayers

HELayers is IBM Research's FHE SDK for C++ and Python, packaged as Docker containers and shipped with around fifteen tutorials and example applications for tasks such as credit card fraud detection, encrypted database search, heart disease detection and image classification.[^10][^17] The associated 2020 paper, "HELayers: A Tile Tensors Framework for Large Neural Networks on Encrypted Data," introduces tile tensors, an abstraction that hides ciphertext packing decisions behind an array API and includes an optimizer that selects packing schemes automatically. HELayers includes a simulator that runs metadata-only "mockup" tile tensors several orders of magnitude faster than full encrypted execution to assist parameter tuning, and the framework's HE-friendly AlexNet runs in around three minutes per inference.[^17] HELayers also powers IBM's FHE Cloud Service.[^10]

Intel HEXL and HEAAN

Intel HEXL is an open-source acceleration library for polynomial arithmetic that exploits the AVX-512 and AVX-512-IFMA52 instruction sets on third-generation and later Xeon Scalable processors.[^11] The 2021 HEXL paper by Boemer and colleagues reports up to 7.2x single-threaded speedups on the forward NTT, 6.7x on the inverse NTT, and 6.0x on element-wise vector-vector modular multiplication relative to a native C++ implementation.[^11] HEXL has been integrated into Microsoft SEAL, OpenFHE, PALISADE and HElib, and remains actively maintained, with v1.2.6 released on May 29, 2025.[^11]

HEAAN is the original CKKS reference library released in 2016 by Cheon's group at Seoul National University. It is now maintained by CryptoLab, which is also developing a commercial CKKS implementation called HEaaN with GPU acceleration.[^19][^28]

Comparison

Library	Primary developer	Schemes	Language	License	Reference
Microsoft SEAL	Microsoft Research	BFV, BGV, CKKS	C++	MIT	[^8]
OpenFHE	OpenFHE community	BGV, BFV, CKKS, DM/FHEW, CGGI/TFHE	C++	BSD-2-Clause	[^9]
Concrete ML	Zama	TFHE (via Concrete)	Python	BSD-3-Clause-Clear	[^27]
HELayers	IBM Research	Multiple via plugins	C++ and Python	Commercial / community	[^10][^17]
HEXL	IntelLabs	NTT and modular arithmetic acceleration	C++	Apache 2.0	[^11]
HEAAN / HEaaN	SNU Cryptography Lab and CryptoLab	CKKS	C++	Open and commercial	[^19][^28]

Hybrid approaches with MPC and TEE

In practice, secure inference systems often combine FHE with secure multi-party computation (MPC) or trusted execution environments (TEEs). A typical hybrid pattern is "linear layers under FHE, non-linear layers under MPC": homomorphic encryption efficiently handles dense matrix multiplications and convolutions on packed vectors, while non-linearities such as ReLU, softmax and comparisons are handled by garbled circuits or secret-sharing protocols that exchange short messages.[^29] This decomposition can deliver much better latency than pure FHE inference for moderate-sized models, at the cost of additional rounds of client-server communication.[^29] Iron and many other transformer inference systems follow this template, combining BFV-based packed matrix multiplication with secret-shared non-linearities.[^21]

TEEs offer a complementary tool. Confidential computing platforms protect data and code inside hardware-isolated enclaves on the cloud provider's machines. Recent privacy-preserving inference systems combine all three techniques, for example performing FHE-protected linear algebra inside a TEE that also holds bootstrapping keys, or using TEEs to attest the correctness of the homomorphic computation.[^29] No single technique dominates: FHE gives the strongest cryptographic guarantees and the smallest trust assumptions but is the slowest, MPC offers more flexibility at the cost of communication and a non-collusion assumption between servers, and TEEs are fast but require trust in the hardware vendor and in side-channel mitigations.

Applications

The most thoroughly studied applications of HE for machine learning are in domains where data is intrinsically sensitive and where regulation, contract law, or competitive concerns prevent moving cleartext data to a central server.

In healthcare, fully homomorphic encryption pipelines have been used to compute polygenic risk scores on encrypted genotypes, including a 110,000-SNP schizophrenia model whose encrypted predictions closely matched plaintext outputs.[^30] Researchers have also published privacy-preserving cancer-type predictors built on datasets containing over two million genetic mutations from 2,713 patients, with machine learning models implemented in homomorphic encryption that retained high accuracy versus their plaintext baselines.[^31]

In finance, HELayers and similar SDKs ship credit-card-fraud-detection and customer-risk reference applications that run neural networks and tree models on encrypted client transaction data, allowing financial institutions to use cloud-hosted machine learning services without sending plaintext data off premises.[^10][^17]

For large language models, Zama's encrypted-LLM demonstration shows that GPT-2 inference can be run on encrypted prompts so that the model provider does not see the user's input, while preserving the original model's predictions.[^23] NEXUS, THE-X and similar transformer-FHE systems give comparable guarantees for BERT-class models in classification and question answering tasks.[^5][^21][^22] In practical deployments, these systems are typically paired with deployment infrastructure such as private inference proxies, attested servers, or hybrid MPC components to keep latency tolerable.

Limitations

Despite a decade of progress, FHE-based machine learning still faces significant constraints.

Latency and throughput. Even with state-of-the-art CKKS implementations and modern hardware, transformer inference under FHE is many orders of magnitude slower than cleartext inference. NEXUS reports about 1,103 seconds per BERT-base forward pass with 128 input tokens, and FHE-encrypted ResNet-style classifiers with ReLU polynomial approximations have been reported to take over a thousand seconds per image even on optimized stacks.[^15][^22] Throughput is acceptable for batched, latency-tolerant settings but is far from real-time interactive use for current models.

Ciphertext size and bandwidth. Encrypted prompts, intermediate activations and evaluation keys can be many megabytes each, which raises both networking and storage costs and makes naïve client-server FHE expensive in mobile and edge settings.[^16]

Accuracy loss from polynomial approximations. Replacing ReLU, GELU, softmax and layer normalization with polynomials of bounded degree introduces controlled but non-zero error. For shallow networks such as a LeNet-style classifier the loss can be negligible, but for deeper networks like a ResNet-20 the polynomial approximation introduces a meaningful accuracy gap relative to the same network trained and evaluated in cleartext.[^15][^32]

Training under FHE is mostly impractical. Most production work performs only inference under FHE, with training conducted on cleartext data and only encrypted at deployment. Encrypted training requires not only forward computation but also backpropagation and parameter updates under FHE, which dramatically increases multiplicative depth and ciphertext bandwidth. Specialized projects, such as encrypted logistic regression and small neural networks, are feasible, but training general deep models on encrypted data remains a research target rather than a deployed reality.[^32]

Threat model gaps. FHE protects input confidentiality but does not by itself address malicious servers, model-inversion attacks from outputs, or membership-inference attacks. Robust deployments therefore combine FHE with verifiable computation, MPC, differential privacy, or attestation. Standardized parameter tables address concrete-security questions but leave open issues such as composition with these other defenses.[^6][^33]

Standardization

The HomomorphicEncryption.org community brought together industry, academic and government participants to draft a standard for parameter selection and security analysis of FHE. After two workshops and three white papers covering Security, API and Applications, the first version of the Homomorphic Encryption Security Standard was released on November 21, 2018.[^6] The standard describes the BFV, BGV and CKKS schemes, captures the community's current understanding of their security against lattice attacks, and recommends parameter tables for 128-bit, 192-bit and 256-bit security levels.[^6] Production libraries including OpenFHE explicitly target compliance with the standard's parameter recommendations.[^9]

Subsequent community efforts have focused on a standardized API and on programming model questions, with active discussion of how to capture rotations, key management, and bootstrapping in a way that lets users move models across libraries without re-tuning parameters.[^6]

Homomorphic encryption is one of several privacy technologies relevant to machine learning. Differential privacy adds calibrated noise to outputs or gradients so that individual records cannot be inferred but does not by itself protect intermediate data from a curious server. Secure multi-party computation and trusted execution environments give complementary guarantees, often combined with FHE in hybrid stacks.[^29] Federated learning keeps data on user devices and aggregates only model updates, but the aggregated updates can still leak information unless protected by FHE or differential privacy. Together these techniques form the toolbox commonly described as "privacy-preserving machine learning."

References

Homomorphic encryption for machine learning

Background

Homomorphic encryption before deep learning

From cryptography to ML

How it works

Schemes used in ML

What makes ML hard for FHE

Threat model and operational pipeline

Schemes and milestones

CryptoNets and the square-activation era

CKKS for approximate-arithmetic neural networks

Transformer inference under FHE

Other relevant schemes and variants

Implementations

Microsoft SEAL

OpenFHE

Zama Concrete and Concrete ML

IBM HELayers

Intel HEXL and HEAAN

Comparison

Hybrid approaches with MPC and TEE

Applications

Limitations

Standardization

Related work

See also

References

Improve this article

Homomorphic encryption for machine learning

Background

Homomorphic encryption before deep learning

From cryptography to ML

How it works

Schemes used in ML

What makes ML hard for FHE

Threat model and operational pipeline

Schemes and milestones

CryptoNets and the square-activation era

CKKS for approximate-arithmetic neural networks

Transformer inference under FHE

Other relevant schemes and variants

Implementations

Microsoft SEAL

OpenFHE

Zama Concrete and Concrete ML

IBM HELayers

Intel HEXL and HEAAN

Comparison

Hybrid approaches with MPC and TEE

Applications

Limitations

Standardization

Related work

See also

References