# PyTorch Lightning

> Source: https://aiwiki.ai/wiki/pytorch_lightning
> Updated: 2026-06-23
> Categories: Developer Tools
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

PyTorch Lightning is an open-source deep-learning framework, created by William Falcon in 2019, that wraps [PyTorch](/wiki/pytorch) to abstract away the engineering boilerplate of training loops, [distributed training](/wiki/distributed_training), mixed precision, checkpointing, and logging, while keeping the research code (model definition, loss, optimization) explicit and modular.[2] The user writes a `LightningModule` describing what the model does, and a `Trainer` decides how it runs across hardware, so the same code scales from one GPU to thousands without rewrites. The framework is published under the Apache License 2.0 by [Lightning AI](/wiki/lightning_ai), the company formerly known as Grid AI.[1] Its official tagline is "Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes," which captures the design philosophy.[1] As of early 2026 the framework had surpassed 160 million cumulative downloads and was used by more than 10,000 organizations.[10][3]

The project was created by William Falcon in 2019 while he was a PhD student at NYU's CILVR Lab, advised by Kyunghyun Cho and Yann LeCun, and concurrently interning at Facebook AI Research.[7][15] It quickly gained traction in academic labs, became part of the official PyTorch ecosystem in 2020, and now anchors a broader product family that includes Lightning Fabric, Lightning Studios, and the metrics library TorchMetrics.[15] As of early 2026 the GitHub repository sits at roughly 31,000 stars, with monthly downloads in the tens of millions and a release cadence of roughly one minor version every two to three months.[1] PyTorch Lightning is widely treated as a core [MLOps](/wiki/mlops) building block for reproducible, scalable model training.

## What problem does PyTorch Lightning solve?

Raw PyTorch gives the user a programming interface for tensors, autograd, and `nn.Module`, but it does not prescribe a training loop. Every project tends to reinvent the same scaffolding: a `for epoch in ...` outer loop, a manual `optimizer.zero_grad()` / `loss.backward()` / `optimizer.step()` sequence, calls to `.to(device)`, manual gradient accumulation, mixed precision with `torch.cuda.amp.autocast`, distributed wrappers such as `DistributedDataParallel`, checkpoint saving, learning-rate scheduling, validation hooks, and integration with logging tools like TensorBoard or Weights & Biases. In a small research script this is annoying. In a production training run on 64 GPUs across 8 nodes with bf16 precision and a sharded optimizer, it is a serious source of bugs.

PyTorch Lightning splits that scaffolding from the model. The user defines a `LightningModule` that contains the model layers, the forward pass, and a few small methods that describe a single training step, validation step, test step, and optimizer configuration.[2] Everything else, including device placement, distributed launch, mixed precision casting, gradient accumulation, checkpoint serialization, and metric logging, is the responsibility of the `Trainer`.[2] Lightning's own marketing claims this can cut typical training-script length by around 70 percent, and most researchers who have used both interfaces agree the reduction is real, even if the exact ratio depends on how organized the original code was.

## When was PyTorch Lightning released?

Falcon began experimenting with the abstractions that would become Lightning around 2018, and the first public release on GitHub appeared in 2019.[15] The library was launched in March 2019 and made public in July of that year; it spread among PhD students looking for a way to share reproducible training code, then was adopted as the recommended submission format for the NeurIPS 2019 Reproducibility Challenge.[15] By 2020 Facebook AI had partnered with the Lightning team and the project was admitted to the official PyTorch ecosystem. The 1.0 release in October 2020 marked the first stable API and arrived alongside the founding of Grid AI, the commercial entity behind the framework.[6]

| Year | Event |
|------|-------|
| 2018 | Falcon begins prototyping Lightning at NYU CILVR Lab and Facebook AI Research. |
| 2019 | Initial public commits to GitHub; first PyPI release in mid-2019. |
| 2019 | Adopted by the NeurIPS 2019 Reproducibility Challenge as a recommended submission format. |
| 2020 | Joins the official PyTorch ecosystem; Facebook AI partners with the Lightning team. |
| 2020 | Version 1.0 released on October 13, 2020; Grid AI founded by William Falcon and Luis Capelo with $18.6M Series A led by Index Ventures. |
| 2022 | Grid AI rebrands to Lightning AI; closes a $40M Series B led by Coatue with participation from Index Ventures. |
| 2023 | PyTorch Lightning 2.0 released on March 15, 2023, alongside Lightning Fabric, the lightweight opt-in alternative. |
| 2023 | Lightning AI Studios launched in December 2023 as the company's enterprise cloud platform. |
| 2024 | Lightning Studio reaches AWS Marketplace; company reports 240,000 users across 2,000 organizations and raises a further $50M from Cisco Investments, J.P. Morgan, K5 Global, and NVIDIA, bringing total funding to $103M. |
| 2024 | Lightning AI joins the PyTorch Foundation as a Premier Member. |
| 2025 | Continued 2.x releases adding FP8 support, improved FSDP integration, and tighter integration with Lightning Studios. |
| 2026 | Version 2.6.1 released January 30, 2026; repository at roughly 31,000 GitHub stars. |

The rebrand from Grid AI to Lightning AI in June 2022 reflected a strategic shift.[5] Grid had focused on a managed cloud product for distributed training; Lightning AI broadened the scope to a general AI development platform, with the open-source framework, Fabric, the deprecated Lightning Apps experiment, and Studios all sitting under one umbrella.[5] The 2.0 release in March 2023 was the most significant inflection point on the open-source side.[3] It cleaned up the Trainer's internal architecture, removed a long backlog of 1.x deprecations, declared the API stable, and introduced Lightning Fabric as a sibling library rather than a hidden internal layer.[3][4] At the time of the 2.0 launch, Lightning AI reported that the framework was "used by more than 10,000 organizations to quickly and cost-efficiently train and scale machine learning models."[4]

## Core concepts

PyTorch Lightning is built around a small set of abstractions that the user composes to describe a training run. The two most important are the `LightningModule` and the `Trainer`.[2]

### LightningModule

A `LightningModule` is a subclass of `torch.nn.Module` with extra hooks. It defines:

- The model architecture, usually inside `__init__` and `forward`.
- A `training_step(batch, batch_idx)` method returning the loss for one batch.
- Optional `validation_step`, `test_step`, and `predict_step` methods.
- A `configure_optimizers` method that returns optimizers and learning-rate schedulers.

Logging is done through `self.log("name", value)` inside any step.[2] The log call is dispatched to whichever logger backend is configured on the Trainer, so the same code works whether you are sending metrics to TensorBoard, Weights & Biases, MLflow, Comet, or several at once. The module knows nothing about devices, ranks, or precision; those are decided by the Trainer.

### Trainer

The `Trainer` is the orchestration engine. It takes a `LightningModule` and runs training, validation, testing, or prediction loops on the requested hardware.[2] A typical instantiation looks like `Trainer(accelerator="gpu", devices=8, strategy="ddp", precision="bf16-mixed", max_epochs=50)`. The Trainer is responsible for backpropagation, optimizer stepping, gradient accumulation, gradient clipping, mixed precision casting, checkpointing, callback invocation, logging, and the launch of distributed processes.[2]

### Strategies

A strategy controls how the model is distributed across workers. Lightning ships strategies for single-device training, single-node multi-GPU via DDP (Distributed Data Parallel), multi-node DDP, [DeepSpeed](/wiki/deepspeed) in stages 1, 2, and 3, [FSDP](/wiki/fsdp) (Fully Sharded Data Parallel, developed in collaboration with Meta), DDP Spawn, TPU strategies for Google Cloud TPU pods, and SLURM-aware launchers.[13] The strategy interface handles process launch, NCCL or Gloo communication setup, parameter sharding, and gradient reduction. Switching from single-GPU to 64-GPU multi-node training is typically a matter of changing two arguments.

### Callbacks

Callbacks are objects that hook into well-defined points of the training loop, such as `on_train_start`, `on_train_batch_end`, `on_validation_epoch_end`, and `on_save_checkpoint`.[2] Built-in callbacks include `ModelCheckpoint`, `EarlyStopping`, `LearningRateMonitor`, `GradientAccumulationScheduler`, `StochasticWeightAveraging`, `BatchSizeFinder`, `LearningRateFinder`, and `RichProgressBar`.[2] Users write their own callbacks for custom behavior such as exporting ONNX after training, sending Slack notifications on failure, or saving sample predictions to disk. Because callbacks are first-class, advanced training tricks tend to ship as small reusable callback packages.

### Loggers

Lightning supports more than ten logger backends out of the box: TensorBoard, [Weights & Biases](/wiki/wandb), [MLflow](/wiki/mlflow), Comet, Neptune, CSV files, and several more.[2] Multiple loggers can be passed to a single Trainer and `self.log(...)` is broadcast to all of them. The logger interface is small, so community-maintained backends for systems like ClearML or AIM exist as third-party packages.

### Precision and gradient handling

PyTorch Lightning supports FP32, FP16 mixed, BF16 mixed, FP8 (on supported hardware such as NVIDIA H100), and 8-bit and 4-bit quantized inference.[2] Gradient clipping by norm or value is enabled with one Trainer flag, gradient accumulation is a single argument, and gradient checkpointing is exposed through PyTorch's standard mechanism with helpers for FSDP-style activation checkpointing. Reproducibility helpers such as `pl.seed_everything(42)` set Python, NumPy, and PyTorch seeds along with deterministic algorithm flags.

### LightningDataModule

A `LightningDataModule` packages dataset download, preparation, and DataLoader construction into a single class that can be shared across projects.[2] It defines `prepare_data`, `setup`, `train_dataloader`, `val_dataloader`, and `test_dataloader`. The pattern is optional; a Trainer can also accept raw DataLoaders.

## How does PyTorch Lightning differ from raw PyTorch and other frameworks?

The deep-learning training-framework space is crowded. Each option below makes different trade-offs along the spectrum from "rewrite the training loop yourself" to "call `.fit()` and trust the defaults."

| Framework | Organization | Base | First release | Focus | Philosophy |
|-----------|--------------|------|---------------|-------|------------|
| PyTorch Lightning | Lightning AI | [PyTorch](/wiki/pytorch) | 2019 | General research and production training | Decouple research code from engineering; explicit hooks |
| Plain [PyTorch](/wiki/pytorch) | Meta / PyTorch Foundation | C++/CUDA | 2016 | Research and production tensor library | Maximum flexibility, minimum opinion |
| Hugging Face Trainer | Hugging Face | [PyTorch](/wiki/pytorch) (and JAX) | 2020 | Transformer fine-tuning, LLMs | Tight coupling with the `transformers` library and the Hub |
| FastAI | fast.ai | [PyTorch](/wiki/pytorch) | 2018 | Education, rapid prototyping | High-level, opinionated, layered API |
| [Keras](/wiki/keras) | Google / Keras team | TensorFlow / [JAX](/wiki/jax) / [PyTorch](/wiki/pytorch) | 2015 | General DL, multi-backend (Keras 3) | Concise model APIs, `model.fit()` |
| [tf.keras](/wiki/tf_keras) | Google | TensorFlow | 2017 | TensorFlow-native Keras | Tight TensorFlow integration |
| PyTorch Ignite | PyTorch Foundation | [PyTorch](/wiki/pytorch) | 2018 | Research training loops | Event-driven Engine; very small core |
| Composer | MosaicML / Databricks | [PyTorch](/wiki/pytorch) | 2021 | Performance-tuned training algorithms | Algorithmic recipes (Selective Backprop, BlurPool, etc.) |
| [JAX](/wiki/jax) + Flax + Optax | Google | XLA | 2020 | Functional research, TPU pods | Pure functions, jit/pmap, manual training loops |
| [DeepSpeed](/wiki/deepspeed) | Microsoft | [PyTorch](/wiki/pytorch) | 2020 | Large-model training optimization | ZeRO sharding, MoE, often used inside other frameworks |
| [Ray](/wiki/ray) Train | Anyscale | Multiple | 2021 | Distributed training orchestration | Cluster scheduling and Ray-native scaling |

Against raw PyTorch, Lightning is a productivity and reliability win for almost any training run that exceeds a single GPU, at the cost of a learning curve and some abstraction overhead. Against FastAI, Lightning is less opinionated about data pipelines and architecture choices but exposes more of the underlying PyTorch primitives. Against the Hugging Face Trainer, Lightning is broader; the HF Trainer is excellent for fine-tuning models from the `transformers` library but less natural for non-Transformer architectures like graph networks, diffusion models, or reinforcement-learning agents. Against PyTorch Ignite, Lightning is heavier and more prescriptive; Ignite gives you an `Engine` and asks you to compose handlers yourself. Against Composer, the question is whether you want algorithmic optimizations baked in. Against [JAX](/wiki/jax) plus Flax, the question is whether you want PyTorch's eager-by-default ergonomics or JAX's functional purity and TPU performance.

## Key features

The feature surface is broad enough that most teams discover capabilities they did not know existed for months. Some of the most load-bearing:

- Multi-GPU and multi-node training without writing distributed code: change two arguments, run `srun` or `torchrun`, done.
- Mixed precision with one flag: `precision="16-mixed"`, `"bf16-mixed"`, or `"fp8-mixed"` on H100-class hardware.
- Automatic checkpointing with resume: `Trainer(...).fit(model, ckpt_path="path/to/ckpt")` restores model weights, optimizer state, scheduler state, RNG state, and epoch counter.
- Logging integration with more than ten backends, including TensorBoard, [Weights & Biases](/wiki/wandb), [MLflow](/wiki/mlflow), Comet, and Neptune.
- Callbacks ecosystem: ModelCheckpoint, EarlyStopping, LearningRateMonitor, GradientAccumulationScheduler, StochasticWeightAveraging, RichProgressBar, plus user-defined callbacks.
- Strategies: DDP, DDP-Spawn, [FSDP](/wiki/fsdp), [DeepSpeed](/wiki/deepspeed) stages 1/2/3 and ZeRO-Infinity, TPU spawn, single-device.
- Gradient clipping (norm and value), gradient accumulation, gradient checkpointing.
- Learning-rate scheduler integration with monitor-based stepping.
- SLURM cluster awareness: detection of `SLURM_PROCID`, automatic re-submission on preemption, signal-based checkpointing.
- Reproducibility helpers: `pl.seed_everything`, deterministic flag, dataloader worker seeding.
- Profilers: simple, advanced, PyTorch profiler, XLA profiler.
- Hyperparameter saving via `self.save_hyperparameters()`, which serializes the constructor arguments into checkpoints.
- Export helpers: `to_torchscript()`, `to_onnx()`.

## Lightning AI ecosystem

The open-source framework is one piece of a larger product family.

### Lightning Fabric

Lightning Fabric, introduced with the 2.0 release in March 2023, is a lightweight alternative to the full Trainer for users who want the distributed-training and precision plumbing but want to keep their own training loop.[3] The pitch is that you can take a raw PyTorch script, replace a handful of lines (model wrapping, optimizer wrapping, the `loss.backward()` call) with Fabric equivalents, and gain multi-GPU, multi-node, FSDP, DeepSpeed, and SLURM support without giving up control over the loop.[3][4] Lightning AI describes Fabric as a tool that "creates a continuum between raw PyTorch and the fully-managed PyTorch Lightning experience," letting users supercharge PyTorch code with accelerators, distributed strategies, and mixed precision while still retaining full control of their training loop.[3] This matters for reinforcement learning, GAN training, and other settings where the loop logic is itself the research contribution. Fabric is opt-in; users do not have to migrate from the Trainer to use it.

### Lightning Studios

Lightning Studios, announced in December 2023 and now sold through cloud marketplaces including AWS, is the company's hosted cloud workspace.[9] A Studio is a cloud VM with persistent storage that runs in a browser-based VS Code or terminal session.[9] Users can switch the underlying GPU type without losing state, share Studios as templates, and run multi-node training jobs from inside a single workspace.[9] The platform integrates the open-source frameworks but is not required to use them.

### Lightning Apps

Lightning Apps was a Python-based application framework for ML pipelines that the company invested in heavily during 2021 and 2022. The idea was to express ML workflows as graphs of components that could run locally or in the cloud. Adoption was modest, and after the 2.0 launch the company deemphasized Apps in favor of Studios and Fabric. The library still exists in maintenance mode but is no longer the centerpiece of the platform.

### TorchMetrics

TorchMetrics started as the metrics module inside PyTorch Lightning and was spun out into a standalone library in 2021.[12] It now contains more than 100 metric implementations across classification, regression, image, audio, and text domains, with correct behavior for distributed training (per-rank state, sync on epoch end).[12] TorchMetrics is usable from raw PyTorch, Fabric, or the full Trainer.[12]

### Lightning Bolts and Lightning Flash

Lightning Bolts was a library of pre-built model components (encoders, GANs, self-supervised baselines) and Lightning Flash was a higher-level task-oriented API. Both were popular in the early ecosystem but have been deprecated; the company now points users at Hugging Face, the main Lightning library, and Lightning Studio templates instead.

## Who uses PyTorch Lightning?

PyTorch Lightning has spread well beyond academic research. NVIDIA's NeMo framework for conversational AI and large language models was built on top of PyTorch Lightning and used the Trainer as its default training loop through NeMo 2.0.[11] The Stable Diffusion training reference implementations released by Stability AI and CompVis used Lightning. Stanford's HyenaDNA project, OpenFold for protein structure prediction, and many MLPerf training submissions are built on Lightning. Hugging Face's TRL library for RLHF and DPO training has integrated Lightning for some recipes.

In industry, the framework is used at companies including NVIDIA, Meta, Microsoft, AWS, Stripe, Hugging Face, Stability AI, Cohere, and many others, often in combination with their internal experiment tracking and infrastructure. Kaggle solutions that involve PyTorch frequently use Lightning, especially for competitions where training stability and checkpoint management matter. The framework reports roughly 4 million monthly downloads on PyPI as of 2023, and Lightning AI reported more than 160 million cumulative downloads of the framework as of late 2024, alongside 240,000 users across 2,000 organizations.[10]

## Strengths

The most common reasons teams choose Lightning:

- Massive reduction in boilerplate; a typical research training script is around 70 percent shorter than its raw PyTorch equivalent.
- Industry-standard distributed training patterns that are battle-tested across thousands of users.
- Strong community with regular releases, active GitHub issues, and a large pool of example projects.
- Reproducibility through structured code and seed helpers.
- Modular extension points (callbacks, strategies, loggers) that make it easy to add custom behavior without forking the framework.
- Tight collaboration with Microsoft on [DeepSpeed](/wiki/deepspeed) integration and with Meta on [FSDP](/wiki/fsdp) integration, so cutting-edge large-model training capabilities arrive quickly.
- Backed by a well-funded company and a Premier-member seat on the PyTorch Foundation.[8]

## Weaknesses and criticisms

No abstraction is free, and Lightning has its share of detractors:

- The abstraction can hide subtle errors. A bug in `training_step` that would be obvious in a flat script can be obscured by the Trainer's machinery.
- The Trainer API is large; the `Trainer` class accepts dozens of arguments and the LightningModule has many optional hooks. Reading the documentation for the first time is overwhelming.
- Performance overhead in some configurations. The Trainer adds a small amount of overhead per step, which can matter at very small batch sizes or on tiny models, though for production-scale training the overhead is negligible.
- The 2.0 release in March 2023 introduced breaking changes that required migration work. The team published an automated migration script and a 1.9.x maintenance branch, but some teams stayed on 1.x for longer than they wanted.
- Lightning Apps was promoted heavily and then deprioritized, leaving some early adopters with unsupported pipelines.
- Some power users prefer raw PyTorch, Fabric, or [DeepSpeed](/wiki/deepspeed)'s own runner because they want exact control over every line of the training loop.
- The Hugging Face Trainer has captured most of the LLM fine-tuning audience, and many of the Lightning patterns that researchers used to write are now wrapped inside higher-level libraries like the `transformers` `Trainer`, TRL, and Axolotl.

In the NeMo example noted earlier, NVIDIA itself moved its newer Megatron-Bridge, AutoModel, and RL projects off the Lightning Trainer to a custom PyTorch loop in 2024 and 2025, citing flexibility and ease of use; the older NeMo 2.0 still uses Lightning.[11] This is a useful illustration of when Lightning is and is not the right tool: for typical supervised training and for fine-tuning at moderate scale, Lightning is hard to beat. For frontier-scale custom training stacks, several teams have ended up writing their own loop on top of Fabric or even raw PyTorch.

## Software statistics

As of early 2026, the GitHub repository at `Lightning-AI/pytorch-lightning` has roughly 31,000 stars and 3,700 forks.[1] The project is licensed under Apache 2.0 and accepts contributions through a Contributor License Agreement.[1] There are dozens of maintainers and hundreds of contributors. Releases follow a roughly two-month cadence; the most recent release at the time of writing is 2.6.1, dated January 30, 2026.[1] The package is published on PyPI as both `pytorch-lightning` (the original name, still maintained) and `lightning` (the unified package introduced in 2022, which bundles PyTorch Lightning, Fabric, and supporting libraries).[14]

## Companion and related projects

The Lightning AI organization maintains a number of companion repositories beyond the core training framework:

| Project | Purpose | Status |
|---------|---------|--------|
| `pytorch-lightning` / `lightning` | Core framework | Actively developed |
| `lightning-fabric` | Lightweight Trainer alternative | Actively developed (since 2.0, March 2023) |
| `torchmetrics` | Distributed-aware metrics library | Actively developed |
| `lightning-bolts` | Reusable research components | Deprecated |
| `lightning-flash` | Task-oriented high-level API | Deprecated |
| `lightning-apps` | Python ML pipeline framework | Deemphasized (2023+) |
| `litgpt` | Hackable LLM training reference | Actively developed |
| `litserve` | Lightning-native model serving | Actively developed |
| `lit-llama` | Reference LLaMA training implementation | Maintenance |

Lightning AI also maintains the commercial Lightning Studios cloud product, which integrates with all of these libraries but is not required to use them.

## Is PyTorch Lightning open source?

The open-source framework is licensed under the Apache License 2.0, which permits commercial use, modification, and redistribution with attribution and patent grant.[1] Lightning AI is the corporate sponsor and primary maintainer; the company is headquartered in New York City and has raised roughly $103 million across its Seed, Series A, and Series B rounds plus the late-2024 strategic round from Cisco Investments, J.P. Morgan, K5 Global, and NVIDIA.[10] Lightning AI joined the PyTorch Foundation as a Premier Member, alongside companies like AMD, AWS, Google, Hugging Face, Intel, Meta, Microsoft, and NVIDIA, which gives the project a formal governance role in the broader PyTorch ecosystem.[8]

## References

1. PyTorch Lightning GitHub repository, Lightning-AI/pytorch-lightning. https://github.com/Lightning-AI/pytorch-lightning
2. PyTorch Lightning documentation. https://lightning.ai/docs/pytorch/stable/
3. "Introducing PyTorch Lightning 2.0 and Fabric," Lightning AI blog, March 15, 2023. https://lightning.ai/pages/blog/introducing-lightning-2-0/
4. "Lightning AI Releases PyTorch Lightning 2.0 and a New Open Source Library for Lightweight Scaling of Machine Learning Models," GlobeNewswire, March 15, 2023. https://www.globenewswire.com/news-release/2023/03/15/2627759/0/en/Lightning-AI-Releases-PyTorch-Lightning-2-0-and-a-New-Open-Source-Library-for-Lightweight-Scaling-of-Machine-Learning-Models.html
5. "Grid.ai rebrands as Lightning AI, raises $40M to expand its AI dev tools," TechCrunch, June 16, 2022. https://techcrunch.com/2022/06/16/grid-ai-rebrands-as-lightning-ai-raises-40m-to-build-ai-dev-tools/
6. PyTorch Lightning 1.0.0 GitHub release notes, October 13, 2020. https://github.com/Lightning-AI/pytorch-lightning/releases/tag/1.0.0
7. William Falcon personal website. https://www.williamfalcon.com/
8. "Lightning AI Joins the PyTorch Foundation as a Premier Member," PyTorch blog. https://pytorch.org/blog/lightning-ai-joins-pytorch/
9. "Lightning AI Introduces Lightning AI Studios; its Enterprise-Grade Platform for Rapid-prototyping, and Deploying AI Products," BusinessWire, December 13, 2023. https://www.businesswire.com/news/home/20231213860557/en/
10. "Lightning AI Raises $50M to Simplify and Scale AI Development For Enterprises and Developers," PR Newswire, November 2024. https://www.prnewswire.com/news-releases/lightning-ai-raises-50m-to-simplify-and-scale-ai-development-for-enterprises-and-developers-302313164.html
11. NVIDIA NeMo documentation, NeMo Models. https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/core/core.html
12. TorchMetrics GitHub repository, Lightning-AI/torchmetrics. https://github.com/Lightning-AI/torchmetrics
13. "Train models with billions of parameters using FSDP," PyTorch Lightning documentation. https://lightning.ai/docs/pytorch/stable/advanced/model_parallel/fsdp.html
14. PyPI listing for pytorch-lightning. https://pypi.org/project/pytorch-lightning/
15. PyTorch Lightning Wikipedia article. https://en.wikipedia.org/wiki/PyTorch_Lightning

