See also: Machine learning terms, Differential privacy, Deep learning
Federated learning is a machine learning paradigm in which multiple participants collaboratively train a shared model while keeping their raw data on local devices or servers. Rather than centralizing training data in one location, each participant (called a "client") trains a model locally and shares only the learned model updates, such as gradient updates or weight differences, with a coordinating server. The server aggregates these updates to produce an improved global model, which is then distributed back to clients for the next round of training. This cycle repeats over many rounds until the model converges. This approach was introduced by researchers at Google in 2016 and has since become a foundational technique for privacy-preserving distributed model training.
The approach has attracted significant attention across industry and academia because it addresses fundamental tensions between the need for large, diverse training datasets and the legal, ethical, and practical constraints on sharing sensitive data. Federated learning is now used in consumer products (smartphone keyboards, voice assistants), healthcare (multi-hospital clinical studies), financial services (fraud detection across banks), and an expanding set of other domains.
Imagine a group of friends who all want to learn how to sort colored blocks, but nobody wants to show their block collection to anyone else. Instead of putting all the blocks in one pile, each friend practices sorting at home and writes down the "sorting tips" they discovered. Then they share only the tips (not the actual blocks) with a teacher, who combines everyone's tips into one super-helpful set of instructions. The teacher sends these combined instructions back to all the friends, so everyone gets better at sorting without ever seeing each other's blocks.
The concept of federated learning originated at Google in 2016. Jakub Konecny, H. Brendan McMahan, Daniel Ramage, and collaborators published two foundational papers in October 2016. The first, "Federated Optimization: Distributed Machine Learning for On-Device Intelligence," outlined the general optimization framework. The second, "Federated Learning: Strategies for Improving Communication Efficiency," introduced key techniques for reducing the communication overhead inherent in distributed training.
The landmark paper that formalized the approach and introduced the widely used Federated Averaging (FedAvg) algorithm was "Communication-Efficient Learning of Deep Networks from Decentralized Data" by McMahan, Moore, Ramage, Hampson, and y Arcas. Although the preprint appeared on arXiv in early 2016 (arXiv:1602.05629), the paper was formally published at the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) in April 2017 [1]. This paper demonstrated that FedAvg could reduce communication costs by 10 to 100 times compared to synchronized stochastic gradient descent (SGD), making on-device training practical.
Google deployed federated learning in production for the first time in 2017, using it to improve the next-word prediction model in Gboard, the company's mobile keyboard application for Android. In 2019, Google published "Towards Federated Learning at Scale: A System Design" (Bonawitz et al., 2019), describing the production system architecture and the engineering challenges of running federated learning across millions of mobile devices. By 2019, the system had scaled to operate across millions of Android devices. In 2023, Google announced that all Gboard neural network language models trained with federated learning carry formal differential privacy guarantees [2].
The field expanded rapidly after 2017. WeBank's AI research group in China proposed the FATE (Federated AI Technology Enabler) framework in 2019, which emphasized vertical federated learning for financial applications. The IEEE published Standard 3652.1-2020, the first industry standard for federated machine learning, in 2020. By the mid-2020s, dozens of open-source frameworks and hundreds of research papers per year had established federated learning as a mainstream area of machine learning research, with adoption across industries including healthcare, finance, telecommunications, and autonomous driving.
Federated Averaging (FedAvg) is the foundational algorithm for federated learning, introduced by McMahan et al. in 2017. It extends stochastic gradient descent to a distributed setting by allowing clients to perform multiple local training steps before communicating with the server, which substantially reduces communication overhead [1].
Each communication round proceeds as follows:
| Hyperparameter | Symbol | Description |
|---|---|---|
| Client fraction | C | Percentage of clients selected to participate each round. A value of C = 0.1 means 10% of available clients are chosen per round. |
| Local epochs | E | Number of complete passes over local data before sending updates. Increasing E reduces communication rounds but may hurt convergence on heterogeneous data. |
| Batch size | B | Mini-batch size for local SGD. A special case of B = infinity means the client uses its entire local dataset as a single batch. |
| Learning rate | eta | Step size for local gradient updates |
| Communication rounds | T | Total number of server-client communication rounds |
The key insight of FedAvg is that increasing E (local epochs) allows clients to make more progress locally before communicating, thereby reducing the total number of communication rounds needed. However, too many local epochs can cause client drift, where individual models diverge from each other due to differences in local data distributions. On IID (independent and identically distributed) data, FedAvg converges reliably. On non-IID data, convergence can be less stable, motivating later algorithmic improvements.
While FedAvg remains widely used, researchers have developed numerous alternative aggregation strategies to address its limitations, particularly when data distributions across clients are non-IID (non-independently and identically distributed).
| Algorithm | Key innovation | Primary benefit | Limitation |
|---|---|---|---|
| FedAvg [1] | Multiple local SGD steps before averaging | Reduces communication rounds by 10-100x | Struggles with highly non-IID data |
| FedProx [3] | Adds a proximal regularization term to the local objective | Stable convergence on heterogeneous data; up to 22% accuracy improvement in highly heterogeneous settings | Requires tuning the proximal coefficient |
| SCAFFOLD [4] | Uses control variates to correct for client drift | Faster convergence on heterogeneous data; consistently outperforms FedAvg and FedProx | Doubles upload cost; requires stateful clients |
| FedMA [5] | Layer-wise neuron matching and averaging of hidden neurons with similar feature extraction signatures | Reduces communication overhead; handles architecture differences; performance improves with more local epochs | Computationally expensive matching step |
| FedAdam / FedYogi [6] | Server-side adaptive optimizers (Adam, Yogi) applied to aggregated pseudo-gradients | Better convergence on difficult optimization landscapes where SGD-based aggregation is slow | Additional server-side state |
| FedDyn [7] | Dynamic regularizer that aligns local and global objectives | Strong theoretical convergence guarantees | More complex implementation; higher computation overhead |
| FedNova [8] | Normalizes updates by local training steps | Handles heterogeneous local computation | Modest improvement over FedAvg in some settings |
FedProx addresses non-IID challenges by penalizing large deviations of local model weights from the global model, keeping local updates closer to the global optimum. SCAFFOLD takes a different approach: it maintains control variates on each client that estimate and correct for the direction of client drift, consistently outperforming both FedAvg and FedProx in highly heterogeneous settings [4]. FedMA matches neurons across client models layer by layer before averaging, producing a more coherent global model when client architectures or learned representations differ.
Federated learning systems are categorized along two independent dimensions: the scale of the federation (who participates) and the way data is partitioned across participants (what each party holds).
Federated learning deployments fall into two broad categories depending on the nature and number of participating clients.
| Characteristic | Cross-device | Cross-silo |
|---|---|---|
| Number of clients | Very large (millions to billions of smartphones, IoT devices, wearables) | Small to moderate (2 to 100 organizations) |
| Typical clients | Smartphones, tablets, IoT sensors, wearables | Hospitals, banks, research labs, enterprises |
| Compute per client | Low (mobile CPUs, limited memory) | High (GPU clusters, data center servers) |
| Data per client | Small (kilobytes to megabytes of local usage data) | Large (gigabytes to terabytes of institutional records) |
| Client availability | Intermittent, unreliable (devices go offline, network drops) | Stable, always online |
| Client identity | Anonymous | Known, contractual |
| Client selection | Random sampling each round | All or most clients participate every round |
| Example | Google Gboard on Android phones, Apple Siri, Hey Google detection | Multi-hospital clinical studies, cross-bank fraud detection |
| Typical privacy mechanism | Differential privacy, secure aggregation | Secure multi-party computation, homomorphic encryption |
| Dominant challenges | Communication efficiency, stragglers | Institutional incentives, data governance |
Cross-device federated learning was the original setting envisioned by McMahan et al. It involves millions of edge devices such as smartphones, tablets, and IoT sensors. Each device contributes a small amount of data, and participation is intermittent because devices may go offline at any time. The server samples a few hundred or thousand devices per round, collects their updates, and aggregates them. Devices only participate when idle, charging, and connected to Wi-Fi. Google's Gboard deployment is the canonical example, along with Apple's on-device learning features.
Cross-silo federated learning involves a smaller number of organizations (typically 2 to 100) that each hold substantial datasets. Participants are reliably available and often bound by contractual agreements. In healthcare, for example, ten hospitals might collaboratively train a diagnostic model. Each hospital runs the training on its own server infrastructure and reliably participates in every round. Cross-silo settings often involve stricter contractual and regulatory requirements, but the engineering is simpler because clients are stable and well-resourced. Multi-hospital medical research collaborations and cross-bank fraud detection systems are common examples.
| Partitioning type | Feature space | Sample space | Example |
|---|---|---|---|
| Horizontal (sample-based) | Same features across clients | Different samples per client | Multiple hospitals with identical EHR fields but different patients |
| Vertical (feature-based) | Different features across clients | Same or overlapping samples | A bank and an e-commerce company sharing overlapping customers but holding different data types |
| Hybrid (federated transfer) | Partially overlapping features and samples | Partially overlapping samples | Organizations with limited overlap in both users and data types |
Horizontal federated learning (also called sample-partitioned or homogeneous FL) is the most common setting. All clients share the same feature space but hold different data samples. For instance, multiple hospitals may record the same types of medical data (blood tests, imaging, diagnoses) but for different patient populations. FedAvg and most standard aggregation algorithms assume horizontal partitioning.
Vertical federated learning (also called feature-partitioned or heterogeneous FL) applies when clients hold different features for the same (or overlapping) set of samples. A bank and a retail company might serve many of the same customers but hold very different data about them (financial transactions vs. purchase history). Entity alignment techniques are used to identify overlapping samples, and the training process is designed so that neither party can see the other's features. Vertical FL typically requires more complex protocols, including secure entity alignment to match shared samples across institutions. It is particularly relevant in financial services and advertising, and was a major focus of WeBank's FATE framework.
Hybrid (federated transfer learning) addresses the case where both samples and features differ partially across clients. Transfer learning techniques bridge the gaps in data coverage. This is the least common configuration.
Federated learning provides a degree of privacy by design because raw data never leaves client devices. However, research has shown that model updates alone can leak information about training data through gradient inversion attacks. Several additional cryptographic and statistical privacy mechanisms are used to strengthen guarantees.
Differential privacy (DP) provides a formal mathematical guarantee that the output of a computation does not reveal whether any single individual's data was included in the input. In federated learning, DP is applied in two main ways [2]:
The strength of the privacy guarantee is controlled by the parameter epsilon. Smaller values of epsilon mean stronger privacy but require more noise, which can reduce model quality. Google announced in 2023 that all Gboard neural network language models trained with federated learning carry formal differential privacy guarantees, using the DP-FTRL (Follow-the-Regularized-Leader) algorithm. Google's production deployment for Gboard achieved a formal differential privacy guarantee of rho = 0.81 zero-concentrated differential privacy (zCDP) [2]. The privacy-utility tradeoff is a core challenge: stronger privacy (more noise) reduces model accuracy.
Secure aggregation is a cryptographic protocol that allows the server to compute the aggregate (sum or average) of client updates without being able to inspect any individual client's update. The protocol uses pairwise masking and secret sharing so that individual contributions cancel out, revealing only the combined result. This prevents the server from performing gradient inversion attacks on individual updates. Google's production federated learning system uses secure aggregation to protect client updates during Gboard training. Secure aggregation adds computational and communication overhead but does not degrade model quality because the aggregate itself is exact (no noise is added).
Homomorphic encryption (HE) allows computation on encrypted data. Clients encrypt their model updates, and the server aggregates the encrypted values without ever decrypting them. The result, once decrypted, is mathematically identical to aggregating the plaintext updates. HE provides strong privacy guarantees but carries significant computational overhead, making it more practical for cross-silo settings with fewer participants. Fully homomorphic encryption supports arbitrary computations on ciphertexts but is computationally expensive, increasing inference latency by roughly 3 to 5 times. Partially homomorphic schemes that support only addition (which is sufficient for weighted averaging) are more practical and are supported by frameworks like NVIDIA FLARE.
Secure multi-party computation (SMPC) enables multiple parties to jointly compute a function over their inputs while keeping those inputs private. In federated learning, SMPC protocols based on secret sharing, garbled circuits, or oblivious transfer can be used to ensure that no single party (including the server) learns anything beyond the final aggregate. SMPC is often combined with differential privacy for layered defense. PySyft from OpenMined implements SMPC primitives alongside federated learning.
In practice, production systems often combine multiple techniques. For example, Google's Gboard system uses both secure aggregation and differential privacy together.
Although federated learning avoids transferring raw data, the repeated exchange of model parameters between clients and the server can be costly, especially for large deep learning models with millions or billions of parameters. Several techniques address this bottleneck.
Quantization reduces the number of bits used to represent each gradient value. Approaches range from 1-bit quantization (retaining only the sign of each gradient) to adaptive schemes like QSGD, which trades off between convergence speed and compression ratio. TernGrad quantizes gradients to three values: {-1, 0, +1}.
Sparsification transmits only the most significant gradient components (those exceeding a threshold), discarding the rest. Top-k sparsification selects the k largest gradient values, sometimes achieving 99% compression with minimal accuracy loss.
Sketching uses randomized data structures (such as count sketches) to compress gradient vectors into fixed-size summaries that can be aggregated on the server.
| Technique | How it works | Trade-off |
|---|---|---|
| Increasing local epochs | Clients train more locally, reducing communication rounds | Risk of client drift |
| Model distillation | Clients send predictions (logits) instead of full model updates | Requires public proxy dataset |
| Federated dropout | Clients train random subsets of the model | Reduced per-round communication at cost of convergence speed |
| Model pruning | Removing low-magnitude weights before transmission | Sparse updates need compatible aggregation |
| Low-rank decomposition | Decomposing weight matrices into smaller factors | Approximation error |
One of the central challenges in federated learning is statistical heterogeneity: the data on different clients is rarely independently and identically distributed (IID). Different hospitals see different patient populations, different users type in different languages, and different factories produce different products. Real-world federated settings exhibit several types of heterogeneity:
Non-IID data causes several problems:
Algorithms like FedProx, SCAFFOLD, and clustered federated learning specifically target this problem.
In cross-device settings, clients have widely varying hardware capabilities and network connections. Some devices may be slow to complete their local training ("stragglers"), delaying the entire round if the server waits for all participants. Strategies include:
Because clients have different data distributions, the global model may perform well on average but poorly for underrepresented groups or clients with atypical data. Ensuring fairness across all participants is an active area of research in federated learning.
A single global model may perform well on average but poorly for individual clients whose data distributions deviate from the aggregate. Personalized federated learning addresses this by producing models tailored to each client's local data.
| Strategy | Description | Example methods |
|---|---|---|
| Local fine-tuning | Train a global model, then fine-tune locally on each client | FedAvg + local SGD, Per-FedAvg |
| Meta-learning | Train the global model as a good initialization for rapid local adaptation | Per-FedAvg (MAML-based), FedMeta |
| Clustering | Group clients with similar data distributions and train separate models per cluster | ClusteredFL, IFCA |
| Multi-task learning | Treat each client as a separate but related task | MOCHA, FedEM |
| Mixture models | Combine global and local model components with learned mixing weights | FedMix, LG-FedAvg |
| Knowledge distillation | Clients distill the global model into personalized local architectures | FedDF, FedMD |
Meta-learning approaches, particularly those based on Model-Agnostic Meta-Learning (MAML), treat the global model as an initialization that can be quickly adapted to any client's distribution with just a few gradient steps. Clustering approaches group clients with similar data characteristics and maintain separate models for each cluster, achieving a middle ground between a single global model and fully individual models.
The rise of large language models (LLMs) has created new demand for federated learning. Organizations increasingly want to fine-tune foundation models like GPT-4, LLaMA, or Mistral on their proprietary data without centralizing it or sending that data to a third party. Federated fine-tuning enables this by distributing the fine-tuning process across participants.
However, full fine-tuning of models with billions of parameters is impractical in federated settings due to communication and computation constraints (billions of parameters that must be transmitted each round). Parameter-efficient fine-tuning (PEFT) methods, especially LoRA (Low-Rank Adaptation), have become the standard approach:
FedML released FedLLM, a platform for building fine-tuned LLMs on proprietary data using federated learning. Flower also supports federated LLM fine-tuning through its framework-agnostic design, compatible with Hugging Face Transformers, PyTorch, and other ML libraries.
Key challenges for federated LLM fine-tuning include high communication bandwidth requirements even with PEFT, convergence difficulties across heterogeneous data distributions, memory constraints on client devices, and the need to protect both the model (which may be proprietary) and the data.
Split learning is a related but distinct technique, first proposed by researchers at MIT Media Lab, that partitions the neural network itself between clients and the server rather than keeping the full model on each client. The neural network is divided at a designated "cut layer." Each client holds the layers up to the cut layer and processes its raw data through the first few layers to produce intermediate activations (called "smashed data"). These activations are sent to the server, which holds the remaining layers and completes the forward pass, computes the loss, and initiates backpropagation. The server returns gradients for the split layer back to the client for backpropagation through the client-side layers. The raw data never leaves the client, and the server never sees it.
Key differences from standard federated learning include:
Smashed data can leak information about the input, so split learning does not provide full privacy on its own. Hybrid approaches, such as SplitFed (Thapa et al., 2020), combine split learning and federated learning to get the parallelism benefits of FL with the reduced per-client computation of split learning.
Federated analytics extends the federated learning paradigm from model training to data analysis. Instead of training a model, federated analytics computes aggregate statistics (counts, histograms, heavy hitters, quantiles) over decentralized data without collecting raw data on the server [9].
Google introduced the concept in 2020 and has deployed it in several products. In Gboard, Confidential Federated Analytics (CFA) is used to discover new words across more than 900 languages; in one deployment, the system identified 3,600 missing Indonesian words in two days [10]. The Now Playing feature on Pixel phones uses federated analytics to maintain an on-device database of song fingerprints.
Federated analytics uses many of the same privacy primitives as federated learning (secure aggregation, differential privacy) but applies them to descriptive statistics rather than model parameters.
Despite its privacy advantages, federated learning introduces unique attack surfaces because the server and clients must exchange model updates.
| Attack | Threat model | Description |
|---|---|---|
| Model poisoning | Malicious client | A compromised client sends crafted model updates that degrade the global model's accuracy or inject a backdoor |
| Data poisoning | Malicious client | A client intentionally mislabels its training data to corrupt the global model and cause it to learn incorrect patterns |
| Gradient inversion | Malicious server | The server reconstructs private training data from observed gradient updates. Attacks like DLG and Inverting Gradients can recover images pixel by pixel [11]. Research has shown that words typed on a mobile keyboard can be recovered with high accuracy from Gboard-style federated learning updates |
| Inference attack | Malicious server or client | An adversary infers properties of a client's training data (e.g., membership inference, attribute inference) from model updates |
| Free-riding | Lazy client | A client sends random or minimal updates while still receiving the benefits of the global model without contributing genuine local training |
| Defense | Protects against | How it works |
|---|---|---|
| Robust aggregation (Krum, trimmed mean, coordinate-wise median) | Model and data poisoning | Filters out outlier updates before averaging |
| Differential privacy | Gradient inversion, inference attacks | Adds noise to updates, bounding information leakage |
| Secure aggregation | Gradient inversion by the server | Cryptographically hides individual updates from the server |
| Anomaly detection | Model poisoning, free-riding | Monitors update statistics to flag suspicious clients |
| Gradient clipping | Gradient inversion | Limits the magnitude of updates, reducing reconstruction fidelity |
| Certified defenses | Poisoning attacks | Provides formal guarantees on model robustness within a certified radius |
In practice, deploying multiple defenses in combination (e.g., secure aggregation with differential privacy and robust aggregation) provides the strongest protection, though each layer introduces trade-offs in model accuracy, computational cost, or communication overhead.
Google has been the primary driver of federated learning research and deployment since introducing the concept in 2016.
Gboard (mobile keyboard): Google's Gboard was the first large-scale production deployment of federated learning. The system trains recurrent neural network language models for next-word prediction, autocorrection, smart compose, slide-to-type, and emoji suggestion on user devices without uploading typing data to servers. Training data (the text a user types) never leaves the device. Google reported a 24% improvement in next-word prediction accuracy compared to server-trained baselines [12]. The system operates across millions of Android devices with differential privacy guarantees. As of 2023, all Gboard neural network language models carry differential privacy guarantees.
"Hey Google" hotword detection: Google uses federated learning to improve the accuracy of the "Hey Google" wake-word model in Google Assistant. The system collects two types of on-device signals: misactivations (the assistant triggered when the user did not say "Hey Google") and near-activations (audio that almost triggered the assistant). Audio recordings stay on the device and are processed only when the phone is idle, charging, and connected to Wi-Fi. Recordings are encrypted and deleted after 63 days or when no longer needed.
Other applications: Google has also applied federated learning to suggested replies in Google Messages, text selection predictions, and out-of-vocabulary word discovery.
Apple adopted federated learning as part of its broader on-device intelligence strategy, combining it with differential privacy. Starting with iOS 13 (September 2019), Apple uses federated learning and differential privacy for:
Federated learning enables hospitals and research institutions to collaboratively train diagnostic models without sharing patient records, addressing regulations such as HIPAA and GDPR. Healthcare is one of the most active domains for federated learning because medical data is highly sensitive, subject to strict regulations, and siloed across institutions.
Banks, insurers, and payment processors use federated learning for cross-institutional fraud detection, anti-money laundering (AML), and credit risk assessment without sharing customer transaction data. Because financial data is subject to regulations like GDPR and national banking secrecy laws, institutions cannot easily pool their transaction records. Federated learning allows them to train shared fraud detection models across institutions without exposing individual customer data. Vertical federated learning is particularly relevant here, as different financial institutions hold different features (credit history, transaction records, insurance claims) for overlapping customer populations. Banking Circle, a global payments bank, adopted Flower's federated learning platform to train AML models across regions.
Autonomous vehicle fleets use federated learning to improve perception and planning models across vehicles while respecting data localization laws. Each vehicle trains on its local driving data and contributes model updates to a shared fleet model, enabling learning from diverse driving conditions without centralizing potentially sensitive location and camera data. NVIDIA's AV Federated Learning platform enables country-specific local training while contributing to a global model, respecting data localization laws that restrict cross-border transfer of driving data.
| Industry | Application | Data Type | Key Benefit |
|---|---|---|---|
| Mobile technology | Next-word prediction, autocorrect, emoji suggestion, voice models | Keystroke data, typing patterns, audio | Personalization without centralizing text or audio |
| Voice assistants | Hotword detection, speaker recognition | Audio recordings | Accuracy improvement without centralizing voice data |
| Healthcare | Disease prediction, medical image segmentation, drug discovery | Patient records, imaging data (CT, MRI, X-ray) | Multi-institutional collaboration under HIPAA/GDPR |
| Finance | Fraud detection, AML, credit scoring | Transaction logs, account histories | Cross-bank pattern detection without exposing customer records |
| Automotive | Object detection, path planning | Camera feeds, LiDAR scans, driving logs | Fleet-wide learning under data localization laws |
| Telecommunications | Network optimization, predictive maintenance, anomaly detection | Network traffic logs, usage patterns | Improved service without transmitting customer behavior data |
| Manufacturing | Predictive maintenance, quality control | Sensor data, production logs | Cross-factory collaboration without exposing proprietary processes |
| Retail and advertising | Recommendation systems, ad targeting | Purchase history, browsing data | Collaborative models across retailers while preserving user privacy |
| Pharmaceuticals | Drug interaction modeling, clinical trials | Clinical trial data, molecular data | Accelerated development without pooling proprietary data |
The European Union's General Data Protection Regulation (GDPR), which took effect in May 2018, imposes strict requirements on the collection, processing, and transfer of personal data. It aligns naturally with federated learning through several principles:
In the United States, the Health Insurance Portability and Accountability Act (HIPAA) requires strict safeguards for protected health information (PHI) and restricts sharing of such data. Federated learning allows healthcare institutions to benefit from collaborative model training while keeping patient data within each provider's infrastructure, reducing the need for complex Business Associate Agreements and data use agreements, minimizing breach risks, and avoiding a central server that could be breached or subpoenaed. However, HIPAA does not explicitly address federated learning, and organizations must still ensure that model updates themselves do not inadvertently leak PHI.
China's Personal Information Protection Law (PIPL), Brazil's Lei Geral de Protecao de Dados (LGPD), and India's Digital Personal Data Protection Act all impose data localization or data minimization requirements that federated learning can help satisfy. The IEEE 3652.1-2020 standard provides a formal specification for federated machine learning, covering reference architecture, data usage, and performance evaluation.
Several open-source frameworks support federated learning research and deployment. The following table compares the most widely used options:
| Framework | Developer | Language | Backend support | Key strengths | License |
|---|---|---|---|---|---|
| TensorFlow Federated (TFF) | Python | TensorFlow | Fine-grained algorithm control, strong simulation support, built-in DP and secure aggregation | Apache 2.0 | |
| Flower | Flower Labs | Python (with mobile SDKs for iOS/Android) | Framework-agnostic (PyTorch, TensorFlow, JAX, scikit-learn, XGBoost, etc.) | Highest overall usability score in comparative studies, production-ready, easy onboarding | Apache 2.0 |
| PySyft | OpenMined | Python | PyTorch, TensorFlow | Broadest privacy toolkit (FL + DP + SMPC + HE), remote data science | Apache 2.0 |
| NVIDIA FLARE | NVIDIA | Python | Framework-agnostic | Enterprise-grade security, healthcare focus, homomorphic encryption support, scalable | Apache 2.0 |
| FedML | TensorOpera (formerly FedML Inc.) | Python | Framework-agnostic | MLOps platform, cross-cloud GPU scheduling, integrated LLM fine-tuning | Apache 2.0 |
| Intel OpenFL | Intel | Python | PyTorch, TensorFlow | Trusted execution environment (Intel SGX) support, validated in large-scale healthcare federations | Apache 2.0 |
| FATE | WeBank | Python | PyTorch, TensorFlow | Strong vertical FL support, first industrial-grade FL framework, widely adopted in China | Apache 2.0 |
TensorFlow Federated (TFF) is developed by Google and provides two API layers: a high-level Federated Learning API for applying standard algorithms (FedAvg, FedSGD) to existing TensorFlow models, and a lower-level Federated Core API for expressing custom federated computations. TFF is primarily designed for simulation and research, with high-performance simulation deployable on Kubernetes. It includes built-in support for differential privacy and secure aggregation.
Flower has emerged as the most popular general-purpose federated learning framework as of 2025. In comparative evaluations, Flower achieved an overall score of 84.75%, ranking first among open-source FL frameworks for framework agnosticism, ease of integration, and overall performance. Its design philosophy emphasizes framework agnosticism: developers can federate existing code written in PyTorch, TensorFlow, JAX, scikit-learn, Hugging Face Transformers, and many other libraries with minimal modifications. Flower provides mobile SDKs for on-device training on iOS and Android, a dataset partitioning library (flwr-datasets), a hub for sharing federated apps (Flower Hub), and reproducible baselines for well-known FL research results.
PySyft, developed by the OpenMined community, is unique among FL frameworks in its breadth of privacy-enhancing technologies. Beyond federated learning, PySyft supports differential privacy, secure multi-party computation, and homomorphic encryption. It enables a "remote data science" paradigm where researchers can run experiments on data they cannot see. PySyft requires manual implementation of FL strategies (even FedAvg), giving researchers maximum flexibility but demanding more boilerplate code and a steeper learning curve.
NVIDIA FLARE (Federated Learning Application Runtime Environment) is designed for enterprise and healthcare deployments. It supports homomorphic encryption, differential privacy, and a server-client architecture with gRPC-based secure communication. FLARE has been used in landmark medical studies, including the 20-institution COVID-19 outcome prediction study and various radiology AI collaborations with the American College of Radiology.
FedML (Foundational Ecosystem Design for Machine Learning) supports three computing paradigms: on-device training for edge devices, distributed computing across GPU clusters, and single-machine simulation. Its MLOps platform, now branded TensorOpera AI, enables users to deploy federated learning across multi-cloud and on-premise environments. FedML was among the first frameworks to offer integrated support for federated LLM fine-tuning through FedLLM.
Intel's OpenFL is notable for its integration with hardware-based security through Intel SGX (Software Guard Extensions), which provides trusted execution environments for federated learning. OpenFL makes TLS the default transport security and includes a built-in PKI mechanism. The framework was used in the 71-site brain tumor segmentation study, one of the largest federated healthcare collaborations to date.
FATE (Federated AI Technology Enabler), developed by WeBank, was the first industrial-grade federated learning framework and is widely adopted in China. It emphasizes strong vertical federated learning support, making it particularly well-suited for financial applications where different institutions hold different features about overlapping customer populations, and for compliance with Chinese regulatory requirements.