Federated Learning

Federated learning is a machine learning paradigm in which multiple participants collaboratively train a shared model while keeping their raw data on local devices or servers. Rather than centralizing training data in one location, each participant (called a "client") trains a model locally and shares only the learned model updates, such as gradient updates or weight differences, with a coordinating server. The server aggregates these updates to produce an improved global model, which is then distributed back to clients for the next round of training. This cycle repeats over many rounds until the model converges. This approach was introduced by researchers at Google in 2016 and has since become a foundational technique for privacy-preserving distributed model training.

The approach has attracted significant attention across industry and academia because it addresses fundamental tensions between the need for large, diverse training datasets and the legal, ethical, and practical constraints on sharing sensitive data. Federated learning is now used in consumer products (smartphone keyboards, voice assistants), healthcare (multi-hospital clinical studies), financial services (fraud detection across banks), and an expanding set of other domains.

Explain like I'm 5 (ELI5)

Imagine a group of friends who all want to learn how to sort colored blocks, but nobody wants to show their block collection to anyone else. Instead of putting all the blocks in one pile, each friend practices sorting at home and writes down the "sorting tips" they discovered. Then they share only the tips (not the actual blocks) with a teacher, who combines everyone's tips into one super-helpful set of instructions. The teacher sends these combined instructions back to all the friends, so everyone gets better at sorting without ever seeing each other's blocks.

History and origins

The concept of federated learning originated at Google in 2016. Jakub Konecny, H. Brendan McMahan, Daniel Ramage, and collaborators published two foundational papers in October 2016. The first, "Federated Optimization: Distributed Machine Learning for On-Device Intelligence," outlined the general optimization framework. The second, "Federated Learning: Strategies for Improving Communication Efficiency," introduced key techniques for reducing the communication overhead inherent in distributed training.

The landmark paper that formalized the approach and introduced the widely used Federated Averaging (FedAvg) algorithm was "Communication-Efficient Learning of Deep Networks from Decentralized Data" by McMahan, Moore, Ramage, Hampson, and y Arcas. Although the preprint appeared on arXiv in early 2016 (arXiv:1602.05629), the paper was formally published at the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) in April 2017 ^[1]. This paper demonstrated that FedAvg could reduce communication costs by 10 to 100 times compared to synchronized stochastic gradient descent (SGD), making on-device training practical.

Google deployed federated learning in production for the first time in 2017, using it to improve the next-word prediction model in Gboard, the company's mobile keyboard application for Android. In 2019, Google published "Towards Federated Learning at Scale: A System Design" (Bonawitz et al., 2019), describing the production system architecture and the engineering challenges of running federated learning across millions of mobile devices. By 2019, the system had scaled to operate across millions of Android devices. In 2023, Google announced that all Gboard neural network language models trained with federated learning carry formal differential privacy guarantees ^[2].

The field expanded rapidly after 2017. WeBank's AI research group in China proposed the FATE (Federated AI Technology Enabler) framework in 2019, which emphasized vertical federated learning for financial applications. The IEEE published Standard 3652.1-2020, the first industry standard for federated machine learning, in 2020. By the mid-2020s, dozens of open-source frameworks and hundreds of research papers per year had established federated learning as a mainstream area of machine learning research, with adoption across industries including healthcare, finance, telecommunications, and autonomous driving.

The FedAvg algorithm

Federated Averaging (FedAvg) is the foundational algorithm for federated learning, introduced by McMahan et al. in 2017. It extends stochastic gradient descent to a distributed setting by allowing clients to perform multiple local training steps before communicating with the server, which substantially reduces communication overhead ^[1].

How FedAvg works

Each communication round proceeds as follows:

The server selects a fraction C of available clients to participate in the current round.
The server sends the current global model parameters to all selected clients.
Each selected client trains the model locally for E epochs on its own data using mini-batches of size B.
Each client sends its updated model weights back to the server.
The server computes a weighted average of the client updates, where each client's contribution is weighted by the number of local data samples it holds (n_k / n, where n_k is the number of samples on client k and n is the total across all selected clients).
The resulting averaged model becomes the new global model, and the process repeats.

Key hyperparameters

Hyperparameter	Symbol	Description
Client fraction	C	Percentage of clients selected to participate each round. A value of C = 0.1 means 10% of available clients are chosen per round.
Local epochs	E	Number of complete passes over local data before sending updates. Increasing E reduces communication rounds but may hurt convergence on heterogeneous data.
Batch size	B	Mini-batch size for local SGD. A special case of B = infinity means the client uses its entire local dataset as a single batch.
Learning rate	eta	Step size for local gradient updates
Communication rounds	T	Total number of server-client communication rounds

The key insight of FedAvg is that increasing E (local epochs) allows clients to make more progress locally before communicating, thereby reducing the total number of communication rounds needed. However, too many local epochs can cause client drift, where individual models diverge from each other due to differences in local data distributions. On IID (independent and identically distributed) data, FedAvg converges reliably. On non-IID data, convergence can be less stable, motivating later algorithmic improvements.

Aggregation strategies beyond FedAvg

While FedAvg remains widely used, researchers have developed numerous alternative aggregation strategies to address its limitations, particularly when data distributions across clients are non-IID (non-independently and identically distributed).

Algorithm	Key innovation	Primary benefit	Limitation
FedAvg ^[1]	Multiple local SGD steps before averaging	Reduces communication rounds by 10-100x	Struggles with highly non-IID data
FedProx ^[3]	Adds a proximal regularization term to the local objective	Stable convergence on heterogeneous data; up to 22% accuracy improvement in highly heterogeneous settings	Requires tuning the proximal coefficient
SCAFFOLD ^[4]	Uses control variates to correct for client drift	Faster convergence on heterogeneous data; consistently outperforms FedAvg and FedProx	Doubles upload cost; requires stateful clients
FedMA ^[5]	Layer-wise neuron matching and averaging of hidden neurons with similar feature extraction signatures	Reduces communication overhead; handles architecture differences; performance improves with more local epochs	Computationally expensive matching step
FedAdam / FedYogi ^[6]	Server-side adaptive optimizers (Adam, Yogi) applied to aggregated pseudo-gradients	Better convergence on difficult optimization landscapes where SGD-based aggregation is slow	Additional server-side state
FedDyn ^[7]	Dynamic regularizer that aligns local and global objectives	Strong theoretical convergence guarantees	More complex implementation; higher computation overhead
FedNova ^[8]	Normalizes updates by local training steps	Handles heterogeneous local computation	Modest improvement over FedAvg in some settings

FedProx addresses non-IID challenges by penalizing large deviations of local model weights from the global model, keeping local updates closer to the global optimum. SCAFFOLD takes a different approach: it maintains control variates on each client that estimate and correct for the direction of client drift, consistently outperforming both FedAvg and FedProx in highly heterogeneous settings ^[4]. FedMA matches neurons across client models layer by layer before averaging, producing a more coherent global model when client architectures or learned representations differ.

Types of federated learning

Federated learning systems are categorized along two independent dimensions: the scale of the federation (who participates) and the way data is partitioned across participants (what each party holds).

Cross-device vs. cross-silo

Federated learning deployments fall into two broad categories depending on the nature and number of participating clients.

Characteristic	Cross-device	Cross-silo
Number of clients	Very large (millions to billions of smartphones, IoT devices, wearables)	Small to moderate (2 to 100 organizations)
Typical clients	Smartphones, tablets, IoT sensors, wearables	Hospitals, banks, research labs, enterprises
Compute per client	Low (mobile CPUs, limited memory)	High (GPU clusters, data center servers)
Data per client	Small (kilobytes to megabytes of local usage data)	Large (gigabytes to terabytes of institutional records)
Client availability	Intermittent, unreliable (devices go offline, network drops)	Stable, always online
Client identity	Anonymous	Known, contractual
Client selection	Random sampling each round	All or most clients participate every round
Example	Google Gboard on Android phones, Apple Siri, Hey Google detection	Multi-hospital clinical studies, cross-bank fraud detection
Typical privacy mechanism	Differential privacy, secure aggregation	Secure multi-party computation, homomorphic encryption
Dominant challenges	Communication efficiency, stragglers	Institutional incentives, data governance

Cross-device federated learning was the original setting envisioned by McMahan et al. It involves millions of edge devices such as smartphones, tablets, and IoT sensors. Each device contributes a small amount of data, and participation is intermittent because devices may go offline at any time. The server samples a few hundred or thousand devices per round, collects their updates, and aggregates them. Devices only participate when idle, charging, and connected to Wi-Fi. Google's Gboard deployment is the canonical example, along with Apple's on-device learning features.

Cross-silo federated learning involves a smaller number of organizations (typically 2 to 100) that each hold substantial datasets. Participants are reliably available and often bound by contractual agreements. In healthcare, for example, ten hospitals might collaboratively train a diagnostic model. Each hospital runs the training on its own server infrastructure and reliably participates in every round. Cross-silo settings often involve stricter contractual and regulatory requirements, but the engineering is simpler because clients are stable and well-resourced. Multi-hospital medical research collaborations and cross-bank fraud detection systems are common examples.

Data partitioning: horizontal, vertical, and hybrid

Partitioning type	Feature space	Sample space	Example
Horizontal (sample-based)	Same features across clients	Different samples per client	Multiple hospitals with identical EHR fields but different patients
Vertical (feature-based)	Different features across clients	Same or overlapping samples	A bank and an e-commerce company sharing overlapping customers but holding different data types
Hybrid (federated transfer)	Partially overlapping features and samples	Partially overlapping samples	Organizations with limited overlap in both users and data types

Horizontal federated learning (also called sample-partitioned or homogeneous FL) is the most common setting. All clients share the same feature space but hold different data samples. For instance, multiple hospitals may record the same types of medical data (blood tests, imaging, diagnoses) but for different patient populations. FedAvg and most standard aggregation algorithms assume horizontal partitioning.

Vertical federated learning (also called feature-partitioned or heterogeneous FL) applies when clients hold different features for the same (or overlapping) set of samples. A bank and a retail company might serve many of the same customers but hold very different data about them (financial transactions vs. purchase history). Entity alignment techniques are used to identify overlapping samples, and the training process is designed so that neither party can see the other's features. Vertical FL typically requires more complex protocols, including secure entity alignment to match shared samples across institutions. It is particularly relevant in financial services and advertising, and was a major focus of WeBank's FATE framework.

Hybrid (federated transfer learning) addresses the case where both samples and features differ partially across clients. Transfer learning techniques bridge the gaps in data coverage. This is the least common configuration.

Privacy mechanisms

Federated learning provides a degree of privacy by design because raw data never leaves client devices. However, research has shown that model updates alone can leak information about training data through gradient inversion attacks. Several additional cryptographic and statistical privacy mechanisms are used to strengthen guarantees.

Differential privacy

Differential privacy (DP) provides a formal mathematical guarantee that the output of a computation does not reveal whether any single individual's data was included in the input. In federated learning, DP is applied in two main ways ^[2]:

Central DP: The server clips each client's model update (restricting its L2 norm) and adds calibrated Gaussian noise to the aggregated result before distributing the new global model. This protects against an honest-but-curious server and provides better model utility for the same privacy budget, but requires trusting the server.
Local DP: Each client adds calibrated noise to its own model update before sending it to the server. This provides stronger privacy (protecting against a malicious server) but typically degrades model accuracy more.

The strength of the privacy guarantee is controlled by the parameter epsilon. Smaller values of epsilon mean stronger privacy but require more noise, which can reduce model quality. Google announced in 2023 that all Gboard neural network language models trained with federated learning carry formal differential privacy guarantees, using the DP-FTRL (Follow-the-Regularized-Leader) algorithm. Google's production deployment for Gboard achieved a formal differential privacy guarantee of rho = 0.81 zero-concentrated differential privacy (zCDP) ^[2]. The privacy-utility tradeoff is a core challenge: stronger privacy (more noise) reduces model accuracy.

Secure aggregation

Secure aggregation is a cryptographic protocol that allows the server to compute the aggregate (sum or average) of client updates without being able to inspect any individual client's update. The protocol uses pairwise masking and secret sharing so that individual contributions cancel out, revealing only the combined result. This prevents the server from performing gradient inversion attacks on individual updates. Google's production federated learning system uses secure aggregation to protect client updates during Gboard training. Secure aggregation adds computational and communication overhead but does not degrade model quality because the aggregate itself is exact (no noise is added).

Homomorphic encryption

Homomorphic encryption (HE) allows computation on encrypted data. Clients encrypt their model updates, and the server aggregates the encrypted values without ever decrypting them. The result, once decrypted, is mathematically identical to aggregating the plaintext updates. HE provides strong privacy guarantees but carries significant computational overhead, making it more practical for cross-silo settings with fewer participants. Fully homomorphic encryption supports arbitrary computations on ciphertexts but is computationally expensive, increasing inference latency by roughly 3 to 5 times. Partially homomorphic schemes that support only addition (which is sufficient for weighted averaging) are more practical and are supported by frameworks like NVIDIA FLARE.

Secure multi-party computation

Secure multi-party computation (SMPC) enables multiple parties to jointly compute a function over their inputs while keeping those inputs private. In federated learning, SMPC protocols based on secret sharing, garbled circuits, or oblivious transfer can be used to ensure that no single party (including the server) learns anything beyond the final aggregate. SMPC is often combined with differential privacy for layered defense. PySyft from OpenMined implements SMPC primitives alongside federated learning.

In practice, production systems often combine multiple techniques. For example, Google's Gboard system uses both secure aggregation and differential privacy together.

Communication efficiency

Although federated learning avoids transferring raw data, the repeated exchange of model parameters between clients and the server can be costly, especially for large deep learning models with millions or billions of parameters. Several techniques address this bottleneck.

Gradient compression

Quantization reduces the number of bits used to represent each gradient value. Approaches range from 1-bit quantization (retaining only the sign of each gradient) to adaptive schemes like QSGD, which trades off between convergence speed and compression ratio. TernGrad quantizes gradients to three values: {-1, 0, +1}.

Sparsification transmits only the most significant gradient components (those exceeding a threshold), discarding the rest. Top-k sparsification selects the k largest gradient values, sometimes achieving 99% compression with minimal accuracy loss.

Sketching uses randomized data structures (such as count sketches) to compress gradient vectors into fixed-size summaries that can be aggregated on the server.

Other efficiency techniques

Technique	How it works	Trade-off
Increasing local epochs	Clients train more locally, reducing communication rounds	Risk of client drift
Model distillation	Clients send predictions (logits) instead of full model updates	Requires public proxy dataset
Federated dropout	Clients train random subsets of the model	Reduced per-round communication at cost of convergence speed
Model pruning	Removing low-magnitude weights before transmission	Sparse updates need compatible aggregation
Low-rank decomposition	Decomposing weight matrices into smaller factors	Approximation error

The non-IID data challenge

One of the central challenges in federated learning is statistical heterogeneity: the data on different clients is rarely independently and identically distributed (IID). Different hospitals see different patient populations, different users type in different languages, and different factories produce different products. Real-world federated settings exhibit several types of heterogeneity:

Label skew: Different clients have different distributions of class labels. A hospital specializing in cardiology will have far more heart disease cases than a general practice.
Feature skew: The same label may look different across clients due to differences in data collection (different camera models, sensor types, or user demographics).
Quantity skew: Some clients have vastly more data than others.
Temporal skew: Data distributions shift over time as client populations or environments change.

Non-IID data causes several problems:

Client drift: Each client's local model drifts toward its own data distribution during local training, diverging from the global optimum.
Slower convergence: The global model takes more communication rounds to reach acceptable accuracy.
Lower final accuracy: The converged model may perform worse than one trained on centralized IID data.

Algorithms like FedProx, SCAFFOLD, and clustered federated learning specifically target this problem.

Stragglers and system heterogeneity

In cross-device settings, clients have widely varying hardware capabilities and network connections. Some devices may be slow to complete their local training ("stragglers"), delaying the entire round if the server waits for all participants. Strategies include:

Asynchronous aggregation: The server updates the global model as soon as each client's update arrives, without waiting for all clients.
Client selection: The server preferentially selects clients that are likely to respond quickly.
Timeout mechanisms: The server proceeds without updates from clients that fail to respond within a deadline.

Fairness and bias

Because clients have different data distributions, the global model may perform well on average but poorly for underrepresented groups or clients with atypical data. Ensuring fairness across all participants is an active area of research in federated learning.

Personalization in federated learning

A single global model may perform well on average but poorly for individual clients whose data distributions deviate from the aggregate. Personalized federated learning addresses this by producing models tailored to each client's local data.

Main personalization strategies

Strategy	Description	Example methods
Local fine-tuning	Train a global model, then fine-tune locally on each client	FedAvg + local SGD, Per-FedAvg
Meta-learning	Train the global model as a good initialization for rapid local adaptation	Per-FedAvg (MAML-based), FedMeta
Clustering	Group clients with similar data distributions and train separate models per cluster	ClusteredFL, IFCA
Multi-task learning	Treat each client as a separate but related task	MOCHA, FedEM
Mixture models	Combine global and local model components with learned mixing weights	FedMix, LG-FedAvg
Knowledge distillation	Clients distill the global model into personalized local architectures	FedDF, FedMD

Meta-learning approaches, particularly those based on Model-Agnostic Meta-Learning (MAML), treat the global model as an initialization that can be quickly adapted to any client's distribution with just a few gradient steps. Clustering approaches group clients with similar data characteristics and maintain separate models for each cluster, achieving a middle ground between a single global model and fully individual models.

Federated fine-tuning of large language models

The rise of large language models (LLMs) has created new demand for federated learning. Organizations increasingly want to fine-tune foundation models like GPT-4, LLaMA, or Mistral on their proprietary data without centralizing it or sending that data to a third party. Federated fine-tuning enables this by distributing the fine-tuning process across participants.

However, full fine-tuning of models with billions of parameters is impractical in federated settings due to communication and computation constraints (billions of parameters that must be transmitted each round). Parameter-efficient fine-tuning (PEFT) methods, especially LoRA (Low-Rank Adaptation), have become the standard approach:

FedLoRA: Each client fine-tunes only the low-rank adapter matrices (typically a few million parameters instead of billions), then sends these small adapter updates to the server for aggregation, dramatically reducing communication costs.
FlexLoRA (Bai et al., 2024): Allows dynamic adjustment of local LoRA ranks across clients based on their computational resources, fostering a global model with broader, less task-specific knowledge.
HETLORA: Uses heterogeneous LoRA designs across devices to address training imbalance, with sparsity-weighted aggregation on the server.

FedML released FedLLM, a platform for building fine-tuned LLMs on proprietary data using federated learning. Flower also supports federated LLM fine-tuning through its framework-agnostic design, compatible with Hugging Face Transformers, PyTorch, and other ML libraries.

Key challenges for federated LLM fine-tuning include high communication bandwidth requirements even with PEFT, convergence difficulties across heterogeneous data distributions, memory constraints on client devices, and the need to protect both the model (which may be proprietary) and the data.

Split learning

Split learning is a related but distinct technique, first proposed by researchers at MIT Media Lab, that partitions the neural network itself between clients and the server rather than keeping the full model on each client. The neural network is divided at a designated "cut layer." Each client holds the layers up to the cut layer and processes its raw data through the first few layers to produce intermediate activations (called "smashed data"). These activations are sent to the server, which holds the remaining layers and completes the forward pass, computes the loss, and initiates backpropagation. The server returns gradients for the split layer back to the client for backpropagation through the client-side layers. The raw data never leaves the client, and the server never sees it.

Key differences from standard federated learning include:

Model privacy: Neither the client nor the server has full knowledge of the complete model architecture, offering a form of intellectual property protection absent in federated learning.
Lower client compute: Because clients only run a portion of the network, split learning is better suited to resource-constrained edge devices.
Sequential bottleneck: In its basic form, split learning processes clients one at a time (or in small groups), which can be slower than the fully parallel training of federated learning. It also introduces latency due to the sequential dependency between client and server computation.

Smashed data can leak information about the input, so split learning does not provide full privacy on its own. Hybrid approaches, such as SplitFed (Thapa et al., 2020), combine split learning and federated learning to get the parallelism benefits of FL with the reduced per-client computation of split learning.

Federated analytics

Federated analytics extends the federated learning paradigm from model training to data analysis. Instead of training a model, federated analytics computes aggregate statistics (counts, histograms, heavy hitters, quantiles) over decentralized data without collecting raw data on the server ^[9].

Google introduced the concept in 2020 and has deployed it in several products. In Gboard, Confidential Federated Analytics (CFA) is used to discover new words across more than 900 languages; in one deployment, the system identified 3,600 missing Indonesian words in two days ^[10]. The Now Playing feature on Pixel phones uses federated analytics to maintain an on-device database of song fingerprints.

Federated analytics uses many of the same privacy primitives as federated learning (secure aggregation, differential privacy) but applies them to descriptive statistics rather than model parameters.

Security attacks and defenses

Despite its privacy advantages, federated learning introduces unique attack surfaces because the server and clients must exchange model updates.

Attack types

Attack	Threat model	Description
Model poisoning	Malicious client	A compromised client sends crafted model updates that degrade the global model's accuracy or inject a backdoor
Data poisoning	Malicious client	A client intentionally mislabels its training data to corrupt the global model and cause it to learn incorrect patterns
Gradient inversion	Malicious server	The server reconstructs private training data from observed gradient updates. Attacks like DLG and Inverting Gradients can recover images pixel by pixel ^[11]. Research has shown that words typed on a mobile keyboard can be recovered with high accuracy from Gboard-style federated learning updates
Inference attack	Malicious server or client	An adversary infers properties of a client's training data (e.g., membership inference, attribute inference) from model updates
Free-riding	Lazy client	A client sends random or minimal updates while still receiving the benefits of the global model without contributing genuine local training

Defense mechanisms

Defense	Protects against	How it works
Robust aggregation (Krum, trimmed mean, coordinate-wise median)	Model and data poisoning	Filters out outlier updates before averaging
Differential privacy	Gradient inversion, inference attacks	Adds noise to updates, bounding information leakage
Secure aggregation	Gradient inversion by the server	Cryptographically hides individual updates from the server
Anomaly detection	Model poisoning, free-riding	Monitors update statistics to flag suspicious clients
Gradient clipping	Gradient inversion	Limits the magnitude of updates, reducing reconstruction fidelity
Certified defenses	Poisoning attacks	Provides formal guarantees on model robustness within a certified radius

In practice, deploying multiple defenses in combination (e.g., secure aggregation with differential privacy and robust aggregation) provides the strongest protection, though each layer introduces trade-offs in model accuracy, computational cost, or communication overhead.

Applications

Google

Google has been the primary driver of federated learning research and deployment since introducing the concept in 2016.

Gboard (mobile keyboard): Google's Gboard was the first large-scale production deployment of federated learning. The system trains recurrent neural network language models for next-word prediction, autocorrection, smart compose, slide-to-type, and emoji suggestion on user devices without uploading typing data to servers. Training data (the text a user types) never leaves the device. Google reported a 24% improvement in next-word prediction accuracy compared to server-trained baselines ^[12]. The system operates across millions of Android devices with differential privacy guarantees. As of 2023, all Gboard neural network language models carry differential privacy guarantees.

"Hey Google" hotword detection: Google uses federated learning to improve the accuracy of the "Hey Google" wake-word model in Google Assistant. The system collects two types of on-device signals: misactivations (the assistant triggered when the user did not say "Hey Google") and near-activations (audio that almost triggered the assistant). Audio recordings stay on the device and are processed only when the phone is idle, charging, and connected to Wi-Fi. Recordings are encrypted and deleted after 63 days or when no longer needed.

Other applications: Google has also applied federated learning to suggested replies in Google Messages, text selection predictions, and out-of-vocabulary word discovery.

Apple

Apple adopted federated learning as part of its broader on-device intelligence strategy, combining it with differential privacy. Starting with iOS 13 (September 2019), Apple uses federated learning and differential privacy for:

Siri: Federated learning trains speaker recognition models across users' devices. Each device trains locally on its owner's voice data, and only model updates are sent back to Apple's servers. This allows Siri to improve speaker identification without Apple hearing users' audio.
QuickType keyboard: Apple's predictive keyboard uses on-device learning in a manner similar to Google's Gboard, with differential privacy applied to model updates.
Found In Apps: This feature scans local calendar and mail data to identify callers whose numbers are not in the user's contacts, using on-device ML with federated update aggregation.

Healthcare

Federated learning enables hospitals and research institutions to collaboratively train diagnostic models without sharing patient records, addressing regulations such as HIPAA and GDPR. Healthcare is one of the most active domains for federated learning because medical data is highly sensitive, subject to strict regulations, and siloed across institutions.

NVIDIA FLARE COVID-19 study: NVIDIA's Federated Learning Application Runtime Environment (FLARE) has been used in several landmark medical studies. A study published in Nature Medicine (Dayan et al., 2021) trained the EXAM (EMR CXR AI Model) across 20 institutions on five continents to predict oxygen requirements for COVID-19 patients, using chest X-rays and clinical data. The federated model achieved performance comparable to a centralized model trained on pooled data while keeping all patient records at their home institutions ^[13].
Intel OpenFL brain tumor study: Intel's Open Federated Learning (OpenFL) framework was used in one of the largest healthcare federations to date: a federation of 71 sites across six continents trained a brain tumor segmentation model that achieved up to 33% accuracy improvement compared to single-site models ^[14].
Other healthcare efforts: The HealthChain project in Europe, various NIH-funded collaborations in the United States, and national-level federated learning initiatives in countries like the United Kingdom (through NHS partnerships) and Singapore (through the AI Singapore program) are all exploring federated approaches to clinical AI.

Finance

Banks, insurers, and payment processors use federated learning for cross-institutional fraud detection, anti-money laundering (AML), and credit risk assessment without sharing customer transaction data. Because financial data is subject to regulations like GDPR and national banking secrecy laws, institutions cannot easily pool their transaction records. Federated learning allows them to train shared fraud detection models across institutions without exposing individual customer data. Vertical federated learning is particularly relevant here, as different financial institutions hold different features (credit history, transaction records, insurance claims) for overlapping customer populations. Banking Circle, a global payments bank, adopted Flower's federated learning platform to train AML models across regions.

Autonomous vehicles

Autonomous vehicle fleets use federated learning to improve perception and planning models across vehicles while respecting data localization laws. Each vehicle trains on its local driving data and contributes model updates to a shared fleet model, enabling learning from diverse driving conditions without centralizing potentially sensitive location and camera data. NVIDIA's AV Federated Learning platform enables country-specific local training while contributing to a global model, respecting data localization laws that restrict cross-border transfer of driving data.

Industry application summary

Industry	Application	Data Type	Key Benefit
Mobile technology	Next-word prediction, autocorrect, emoji suggestion, voice models	Keystroke data, typing patterns, audio	Personalization without centralizing text or audio
Voice assistants	Hotword detection, speaker recognition	Audio recordings	Accuracy improvement without centralizing voice data
Healthcare	Disease prediction, medical image segmentation, drug discovery	Patient records, imaging data (CT, MRI, X-ray)	Multi-institutional collaboration under HIPAA/GDPR
Finance	Fraud detection, AML, credit scoring	Transaction logs, account histories	Cross-bank pattern detection without exposing customer records
Automotive	Object detection, path planning	Camera feeds, LiDAR scans, driving logs	Fleet-wide learning under data localization laws
Telecommunications	Network optimization, predictive maintenance, anomaly detection	Network traffic logs, usage patterns	Improved service without transmitting customer behavior data
Manufacturing	Predictive maintenance, quality control	Sensor data, production logs	Cross-factory collaboration without exposing proprietary processes
Retail and advertising	Recommendation systems, ad targeting	Purchase history, browsing data	Collaborative models across retailers while preserving user privacy
Pharmaceuticals	Drug interaction modeling, clinical trials	Clinical trial data, molecular data	Accelerated development without pooling proprietary data

Regulatory drivers

The European Union's General Data Protection Regulation (GDPR), which took effect in May 2018, imposes strict requirements on the collection, processing, and transfer of personal data. It aligns naturally with federated learning through several principles:

Data minimization: Organizations must limit data collection to what is strictly necessary. Federated learning supports this by keeping data on-device, collecting only model updates rather than raw personal data.
Purpose limitation: By avoiding centralization, federated learning reduces the risk of data being repurposed beyond its original intent without consent.
Cross-border data transfer restrictions: GDPR restricts transferring personal data outside the European Economic Area. Federated learning allows organizations to collaborate across EU member states (and with non-EU partners) without physically transferring personal data across borders, respecting data localization mandates because each jurisdiction maintains full control of its data.

HIPAA

In the United States, the Health Insurance Portability and Accountability Act (HIPAA) requires strict safeguards for protected health information (PHI) and restricts sharing of such data. Federated learning allows healthcare institutions to benefit from collaborative model training while keeping patient data within each provider's infrastructure, reducing the need for complex Business Associate Agreements and data use agreements, minimizing breach risks, and avoiding a central server that could be breached or subpoenaed. However, HIPAA does not explicitly address federated learning, and organizations must still ensure that model updates themselves do not inadvertently leak PHI.

Other regulations

China's Personal Information Protection Law (PIPL), Brazil's Lei Geral de Protecao de Dados (LGPD), and India's Digital Personal Data Protection Act all impose data localization or data minimization requirements that federated learning can help satisfy. The IEEE 3652.1-2020 standard provides a formal specification for federated machine learning, covering reference architecture, data usage, and performance evaluation.

Frameworks and tools

Several open-source frameworks support federated learning research and deployment. The following table compares the most widely used options:

Framework	Developer	Language	Backend support	Key strengths	License
TensorFlow Federated (TFF)	Google	Python	TensorFlow	Fine-grained algorithm control, strong simulation support, built-in DP and secure aggregation	Apache 2.0
Flower	Flower Labs	Python (with mobile SDKs for iOS/Android)	Framework-agnostic (PyTorch, TensorFlow, JAX, scikit-learn, XGBoost, etc.)	Highest overall usability score in comparative studies, production-ready, easy onboarding	Apache 2.0
PySyft	OpenMined	Python	PyTorch, TensorFlow	Broadest privacy toolkit (FL + DP + SMPC + HE), remote data science	Apache 2.0
NVIDIA FLARE	NVIDIA	Python	Framework-agnostic	Enterprise-grade security, healthcare focus, homomorphic encryption support, scalable	Apache 2.0
FedML	TensorOpera (formerly FedML Inc.)	Python	Framework-agnostic	MLOps platform, cross-cloud GPU scheduling, integrated LLM fine-tuning	Apache 2.0
Intel OpenFL	Intel	Python	PyTorch, TensorFlow	Trusted execution environment (Intel SGX) support, validated in large-scale healthcare federations	Apache 2.0
FATE	WeBank	Python	PyTorch, TensorFlow	Strong vertical FL support, first industrial-grade FL framework, widely adopted in China	Apache 2.0

TensorFlow Federated

TensorFlow Federated (TFF) is developed by Google and provides two API layers: a high-level Federated Learning API for applying standard algorithms (FedAvg, FedSGD) to existing TensorFlow models, and a lower-level Federated Core API for expressing custom federated computations. TFF is primarily designed for simulation and research, with high-performance simulation deployable on Kubernetes. It includes built-in support for differential privacy and secure aggregation.

Flower

Flower has emerged as the most popular general-purpose federated learning framework as of 2025. In comparative evaluations, Flower achieved an overall score of 84.75%, ranking first among open-source FL frameworks for framework agnosticism, ease of integration, and overall performance. Its design philosophy emphasizes framework agnosticism: developers can federate existing code written in PyTorch, TensorFlow, JAX, scikit-learn, Hugging Face Transformers, and many other libraries with minimal modifications. Flower provides mobile SDKs for on-device training on iOS and Android, a dataset partitioning library (flwr-datasets), a hub for sharing federated apps (Flower Hub), and reproducible baselines for well-known FL research results.

PySyft

PySyft, developed by the OpenMined community, is unique among FL frameworks in its breadth of privacy-enhancing technologies. Beyond federated learning, PySyft supports differential privacy, secure multi-party computation, and homomorphic encryption. It enables a "remote data science" paradigm where researchers can run experiments on data they cannot see. PySyft requires manual implementation of FL strategies (even FedAvg), giving researchers maximum flexibility but demanding more boilerplate code and a steeper learning curve.

NVIDIA FLARE

NVIDIA FLARE (Federated Learning Application Runtime Environment) is designed for enterprise and healthcare deployments. It supports homomorphic encryption, differential privacy, and a server-client architecture with gRPC-based secure communication. FLARE has been used in landmark medical studies, including the 20-institution COVID-19 outcome prediction study and various radiology AI collaborations with the American College of Radiology.

FedML

FedML (Foundational Ecosystem Design for Machine Learning) supports three computing paradigms: on-device training for edge devices, distributed computing across GPU clusters, and single-machine simulation. Its MLOps platform, now branded TensorOpera AI, enables users to deploy federated learning across multi-cloud and on-premise environments. FedML was among the first frameworks to offer integrated support for federated LLM fine-tuning through FedLLM.

Intel OpenFL

Intel's OpenFL is notable for its integration with hardware-based security through Intel SGX (Software Guard Extensions), which provides trusted execution environments for federated learning. OpenFL makes TLS the default transport security and includes a built-in PKI mechanism. The framework was used in the 71-site brain tumor segmentation study, one of the largest federated healthcare collaborations to date.

FATE

FATE (Federated AI Technology Enabler), developed by WeBank, was the first industrial-grade federated learning framework and is widely adopted in China. It emphasizes strong vertical federated learning support, making it particularly well-suited for financial applications where different institutions hold different features about overlapping customer populations, and for compliance with Chinese regulatory requirements.

References

McMahan, B., Moore, E., Ramage, D., Hampson, S., & Arcas, B. A. (2017). "Communication-Efficient Learning of Deep Networks from Decentralized Data." Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 1273-1282. arXiv:1602.05629
Google Research. (2023). "Federated Learning with Formal Differential Privacy Guarantees." Google AI Blog. https://research.google/blog/federated-learning-with-formal-differential-privacy-guarantees/
Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2020). "Federated Optimization in Heterogeneous Networks." Proceedings of Machine Learning and Systems (MLSys), 2, 429-450. arXiv:1812.06127
Karimireddy, S. P., Kale, S., Mohri, M., Reddi, S., Stich, S., & Suresh, A. T. (2020). "SCAFFOLD: Stochastic Controlled Averaging for Federated Learning." Proceedings of the 37th International Conference on Machine Learning (ICML). https://proceedings.mlr.press/v119/karimireddy20a.html
Wang, H., Yurochkin, M., Sun, Y., Papailiopoulos, D., & Khazaeni, Y. (2020). "Federated Learning with Matched Averaging." International Conference on Learning Representations (ICLR). arXiv:2002.06440
Reddi, S., Charles, Z., Zaheer, M., Garrett, Z., Rush, K., Konecny, J., Kumar, S., & McMahan, H. B. (2021). "Adaptive Federated Optimization." International Conference on Learning Representations (ICLR). arXiv:2003.00295
Acar, D. A. E., Zhao, Y., Navarro, R. M., Mattina, M., Whatmough, P. N., & Saber, V. (2021). "Federated Learning Based on Dynamic Regularization." International Conference on Learning Representations (ICLR). arXiv:2111.04263
Wang, J., Liu, Q., Liang, H., Joshi, G., & Poor, H. V. (2020). "Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization." Advances in Neural Information Processing Systems (NeurIPS). arXiv:2007.07481
Google Research. (2020). "Federated Analytics: Collaborative Data Science without Data Collection." Google AI Blog. https://research.google/blog/federated-analytics-collaborative-data-science-without-data-collection/
Google Research. (2025). "Discovering New Words with Confidential Federated Analytics." Google AI Blog. https://research.google/blog/discovering-new-words-with-confidential-federated-analytics/
Zhu, L., Liu, Z., & Han, S. (2019). "Deep Leakage from Gradients." Advances in Neural Information Processing Systems (NeurIPS). arXiv:1906.08935
Hard, A., Rao, K., Mathews, R., Ramaswamy, S., Beaufays, F., Augenstein, S., Eichner, H., Kiddon, C., & Ramage, D. (2018). "Federated Learning for Mobile Keyboard Prediction." arXiv:1811.03604
Dayan, I., et al. (2021). "Federated learning for predicting clinical outcomes in patients with COVID-19." Nature Medicine, 27(10), 1735-1743. Also: Flores, M., et al. (2021). "Federated Learning Used for Predicting Outcomes in SARS-COV-2 Patients." Research Square (preprint). doi:10.21203/rs.3.rs-126892/v1
Pati, S., et al. (2022). "Federated Learning Enables Big Data for Rare Cancer Boundary Detection." Nature Communications. doi:10.1038/s41467-022-33407-5
Konecny, J., McMahan, H. B., Ramage, D., & Richtarik, P. (2016). "Federated Learning: Strategies for Improving Communication Efficiency." arXiv:1610.05492.
Konecny, J., McMahan, H. B., Yu, F. X., Richtarik, P., Suresh, A. T., & Bacon, D. (2016). "Federated Optimization: Distributed Machine Learning for On-Device Intelligence." arXiv:1610.02527.
Bonawitz, K., et al. (2019). "Towards Federated Learning at Scale: A System Design." Proceedings of Machine Learning and Systems (MLSys), 1, 374-388.
Xu, Z., et al. (2023). "Federated Learning of Gboard Language Models with Differential Privacy." arXiv:2305.18465.
Beutel, D. J., et al. (2020). "Flower: A Friendly Federated Learning Research Framework." arXiv:2007.14390.
Reina, G. A., et al. (2021). "OpenFL: An open-source framework for Federated Learning." arXiv:2105.06413.
Roth, H. R., et al. (2022). "NVIDIA FLARE: Federated Learning from Simulation to Real-World." arXiv:2210.13291.
He, C., et al. (2020). "FedML: A Research Library for Federated Machine Learning." arXiv:2007.13518.
Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). "Federated Machine Learning: Concept and Applications." ACM Transactions on Intelligent Systems and Technology, 10(2), 1-19.
IEEE Standard 3652.1-2020. "IEEE Guide for Architectural Framework and Application of Federated Machine Learning."
Thapa, C., Chamikara, M. A. P., & Camtepe, S. (2020). "SplitFed: When Federated Learning Meets Split Learning." arXiv:2004.12088.

Explain like I'm 5 (ELI5)

History and origins

The FedAvg algorithm

How FedAvg works

Key hyperparameters

Aggregation strategies beyond FedAvg

Types of federated learning

Cross-device vs. cross-silo

Data partitioning: horizontal, vertical, and hybrid

Privacy mechanisms

Differential privacy

Secure aggregation

Homomorphic encryption

Secure multi-party computation

Communication efficiency

Gradient compression

Other efficiency techniques

The non-IID data challenge

Stragglers and system heterogeneity

Fairness and bias

Personalization in federated learning

Main personalization strategies

Federated fine-tuning of large language models

Split learning

Federated analytics

Security attacks and defenses

Attack types

Defense mechanisms

Applications

Google

Apple

Healthcare

Finance

Autonomous vehicles

Industry application summary

Regulatory drivers

GDPR

HIPAA

Other regulations

Frameworks and tools

TensorFlow Federated

Flower

PySyft

NVIDIA FLARE

FedML

Intel OpenFL

FATE

See also

References

Improve this article

Related Articles

Sparse autoencoder

ARC-AGI 2

General Data Protection Regulation (GDPR)

Differential privacy

GELU (Gaussian Error Linear Unit)

LeNet

Explain like I'm 5 (ELI5)

History and origins

The FedAvg algorithm

How FedAvg works

Key hyperparameters

Aggregation strategies beyond FedAvg

Types of federated learning

Cross-device vs. cross-silo

Data partitioning: horizontal, vertical, and hybrid

Privacy mechanisms

Differential privacy

Secure aggregation

Homomorphic encryption

Secure multi-party computation

Communication efficiency

Gradient compression

Other efficiency techniques

The non-IID data challenge

Stragglers and system heterogeneity

Fairness and bias

Personalization in federated learning

Main personalization strategies

Federated fine-tuning of large language models