HuggingFace PEFT
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,843 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,843 words
Add missing citations, update stale details, or suggest a clearer explanation.
PEFT (Parameter-Efficient Fine-Tuning) is an open-source Python library developed by Hugging Face that provides unified, modular implementations of parameter-efficient adaptation methods for large pretrained models. The library lets practitioners fine-tune a small set of additional or selected parameters while keeping most of a base model's weights frozen, dramatically lowering the compute and storage costs of customising large language models, vision models, and diffusion models.[^1][^2] PEFT is integrated tightly with the broader Hugging Face stack, including Transformers, Accelerate, Diffusers, and TRL, which has made it the de facto adapter library in the open-source ecosystem since its first public release in February 2023.[^1][^3][^4] The project is hosted at github.com/huggingface/peft, licensed under Apache 2.0, and supports a growing catalogue of methods that includes LoRA, QLoRA, IA3, prefix tuning, prompt tuning, P-tuning, AdaLoRA, X-LoRA, LoftQ, LoHa, LoKr, DoRA, BOFT, VeRA, and VB-LoRA, among many others.[^2][^5][^6]
Fine-tuning a pretrained transformer such as BERT, GPT-2, or T5 historically required updating every parameter of the model and storing a full copy of those parameters per task. As models scaled into the tens and hundreds of billions of parameters, this approach became prohibitive both in GPU memory during training and in storage cost for serving many task-specific copies. The original LoRA paper made this point starkly: deploying multiple independent fine-tuned copies of a 175B parameter GPT-3 model is operationally impractical, because each task copy is itself 175B parameters.[^7] A research literature on parameter-efficient methods emerged through 2019 to 2022, ranging from adapter layers and prefix tuning to LoRA and IA3, but most reference implementations lived in separate repositories with incompatible APIs.
Hugging Face's PEFT library was created to consolidate these methods behind a single API that plugs cleanly into the Transformers ecosystem. The first commit to the huggingface/peft repository on GitHub was made on 25 November 2022, and the project was first released to PyPI as version 0.0.1 on 19 January 2023.[^8] The launch blog post, titled "Parameter-Efficient Fine-Tuning using PEFT," was published on 10 February 2023 by Sourab Mangrulkar and Sayak Paul, who introduced LoRA, prefix tuning, P-tuning, and prompt tuning as the initial set of supported methods.[^1] The same date corresponds to the v0.1.0 tag on GitHub, marked as the "Initial release."[^8]
The motivation given in the launch announcement was twofold: PEFT methods produce checkpoints in the megabyte range rather than the gigabyte range of full fine-tunes, while attaining performance close to full fine-tuning; and they are robust to catastrophic forgetting because the base model's weights remain frozen.[^1] The blog post illustrated this with the bigscience/mt0-xxl model, a 40 GB checkpoint that produces 40 GB of per-task storage under full fine-tuning but only a few megabytes per task under LoRA.[^1]
The PEFT library has followed a steady release cadence since its initial launch, generally adding new adapter families, quantization integrations, and distributed training capabilities at each minor version. The table below summarises major releases as of May 2026.
| Version | Date | Key additions |
|---|---|---|
| v0.1.0 | 2023-02-10 | Initial public release with LoRA, prefix tuning, prompt tuning, P-tuning[^8] |
| v0.2.0 | 2023-03-10 | API stabilisation, broader task support[^8] |
| v0.3.0 | 2023-05-03 | Multi-adapter support, expanded testing suite, new examples[^8] |
| v0.4.0 | 2023-07-18 | QLoRA integration, IA3 method, AutoPeftModel classes, LoRA for custom (non-transformer) models[^9] |
| v0.5.0 | 2023-08-22 | GPTQ quantization support, low-level API[^8] |
| v0.6.0 | 2023-11-03 | Diffusers backend integration, LoHa, LoKr, multitask prompt tuning, 4-bit/8-bit LoRA merging[^10] |
| v0.7.0 | 2023-12-06 | Orthogonal Fine-Tuning (OFT), Megatron support, safetensors and better initialisation[^8] |
| v0.8.0 | 2024-01-30 | Poly PEFT method, LoRA improvements[^8] |
| v0.9.0 | 2024-02-28 | DoRA support, TIES/DARE/magnitude merging via add_weighted_adapter, AutoAWQ and AQLM quantization[^11] |
| v0.10.0 | 2024-03-21 | QLoRA with DeepSpeed ZeRO-3 and FSDP (70B LLaMA on 2x24GB), layer replication, mixed adapter batches, LoftQ initialisation helper[^12] |
| v0.11.0 | 2024-05-16 | BOFT, VeRA, PiSSA, HQQ and EETQ quantization backends[^8] |
| v0.12.0 | 2024-07-24 | OLoRA, X-LoRA, FourierFT, HRA, and additional methods[^8] |
| v0.13.0 | 2024-09-25 | LoRA+, VB-LoRA, and quality-of-life improvements[^8] |
| v0.14.0 | 2024-12-06 | EVA, Context-aware Prompt Tuning, Bone method[^8] |
| v0.15.0 | 2025-03-19 | CorDA initialisation, Trainable Tokens for selective embedding training[^8] |
| v0.16.0 | 2025-07-03 | LoRA-FA optimizer, RandLoRA, C3A, expanded Conv2d support[^8] |
| v0.17.0 | 2025-08-01 | SHiRA, MiSS, LoRA for nn.Parameter (enabling MoE LoRA), Bone deprecated in favour of MiSS[^8] |
| v0.18.0 | 2025-11-13 | RoAd, ALoRA, Arrow, WaveFT, DeLoRA, OSF, text generation benchmark framework[^8] |
| v0.19.0 | 2026-04-14 | Multiple additional methods, non-LoRA-to-LoRA checkpoint conversion, Tensor Parallelism support[^8] |
| v0.19.1 | 2026-04-16 | Patch release[^8] |
Version 0.4.0 in July 2023 is widely regarded as a turning point because it brought QLoRA into the standard PEFT API, making 4-bit quantized fine-tuning of large LLaMA-class models accessible without manual integration work.[^9] Version 0.6.0 in November 2023 marked another inflection: the Diffusers library replaced its bespoke LoRA loader with PEFT as its adapter backend, unifying text-to-image LoRA inference and training under PEFT and enabling features like instant adapter switching, scaling, and merge or unmerge for Stable Diffusion checkpoints.[^10]
PEFT exposes a wide collection of parameter-efficient methods, organised broadly into low-rank update methods, soft prompting methods, orthogonal methods, and a smaller set of structured or sparse approaches. Each method is implemented as a subclass of PeftConfig paired with adapter modules that wrap the relevant layers of a base model.
LoRA (Low-Rank Adaptation), introduced by Edward Hu and colleagues at Microsoft in 2021, freezes the pretrained weight matrix W and parameterises its update as a product of two trainable low-rank matrices A and B, giving an effective update W + BA where the inner dimension r is much smaller than the original dimensions. The original LoRA paper claims this can reduce trainable parameters by up to 10,000 times relative to full fine-tuning of GPT-3 175B while preserving downstream quality.[^7] LoRA is the most heavily used method in PEFT and serves as the basis for many derivative techniques.
QLoRA (Quantized LoRA), introduced by Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer in May 2023, combines 4-bit NormalFloat (NF4) quantization of the frozen base model with LoRA adapters trained in higher precision. The paper demonstrates fine-tuning a 65B parameter model on a single 48 GB GPU while preserving 16-bit fine-tuning task performance, and the resulting "Guanaco" family of models was widely cited as evidence that strong instruction-tuned LLaMA derivatives could be produced on consumer-grade hardware.[^13] QLoRA support landed in PEFT v0.4.0 alongside IA3 in July 2023.[^9]
DoRA (Weight-Decomposed Low-Rank Adaptation), proposed by NVIDIA researchers in early 2024, decomposes each pretrained weight matrix into a magnitude vector and a direction matrix, then applies LoRA only to the direction component. The paper, accepted as an ICML 2024 Oral, reports consistent improvements over LoRA on commonsense reasoning, visual instruction tuning, and image and video text understanding, especially at very low ranks.[^14] DoRA is enabled in PEFT via use_dora=True in LoraConfig and shipped in v0.9.0 in February 2024, with broader Conv2d and bitsandbytes-quantized support added in v0.10.0.[^11][^12]
AdaLoRA, by Qingru Zhang and colleagues, was accepted at ICLR 2023. It parameterises each update in singular value decomposition form and learns to redistribute the rank budget across layers and weight matrices according to per-triplet importance scores. The training schedule consists of an initial phase with no budgeting, a budgeting phase that prunes low-importance triplets, and a final phase that retrains with the redistributed ranks.[^15][^16]
X-LoRA (Mixture of LoRA Experts) by Eric L. Buehler and Markus J. Buehler, published in February 2024, performs token-level and layer-level gating over a set of frozen LoRA experts. The model runs a first forward pass without adapters to compute scalings, then re-runs with the dynamically weighted LoRA experts applied. The original paper applies X-LoRA to protein mechanics and molecular design tasks.[^17] X-LoRA support was added in PEFT v0.12.0 in July 2024.[^8]
VeRA (Vector-based Random Matrix Adaptation), by Dawid J. Kopiczko, Tijmen Blankevoort, and Yuki M. Asano (ICLR 2024), shares a single pair of frozen random low-rank matrices across all layers and learns only two small scaling vectors per layer. It targets parameter counts considerably smaller than LoRA while preserving accuracy on GLUE, E2E, image classification, and instruction tuning benchmarks.[^18]
VB-LoRA by Yang Li and colleagues introduces a global vector bank from which all adapter low-rank matrices are composed via a differentiable top-k admixture module, enabling parameter sharing across modules and layers and achieving extreme parameter efficiency.[^19]
LoHa (Low-Rank Hadamard) and LoKr (Low-Rank Kronecker) represent ∆W using a Hadamard product of low-rank matrix pairs (LoHa) or a Kronecker product structure (LoKr). LoHa is based on the FedPara construction by Nam Hyeon-Woo and colleagues (arXiv:2108.06098) originally developed for communication-efficient federated learning.[^20] Both methods are popular in the diffusion adapter community via the LyCORIS project and were added in PEFT v0.6.0.[^5][^10]
PiSSA (Principal Singular Values and Singular Vectors Adaptation) by Fanxu Meng and colleagues uses SVD of the pretrained weight matrix to initialise the LoRA A and B matrices with the principal singular components and freezes the residual. The paper reports faster convergence and better final performance than vanilla LoRA, and the method was accepted as a NeurIPS 2024 Spotlight.[^21]
OLoRA by Kerim Büyükakyüz uses QR decomposition of the base weights to initialise the LoRA adapters with orthonormal matrices and to mutate the base weights accordingly, accelerating convergence relative to default LoRA initialisation.[^22]
FourierFT treats the weight change matrix as a 2D spatial signal and learns a small set of discrete Fourier transform coefficients, recovering ∆W via the inverse DFT. The paper reports surpassing LoRA on instruction tuning of LLaMA2-7B using only 0.064 M trainable parameters versus 33.5 M for LoRA.[^23]
Prefix Tuning, by Xiang Lisa Li and Percy Liang (2021), prepends a sequence of continuous task-specific vectors to every layer of a frozen language model. Subsequent tokens attend to these "virtual" prefix vectors as if they were real tokens. The paper reports that learning roughly 0.1% of parameters yields performance comparable to full fine-tuning for GPT-2 table-to-text and BART summarisation tasks, and better than fine-tuning in low-data regimes.[^24]
Prompt Tuning, by Brian Lester, Rami Al-Rfou, and Noah Constant (EMNLP 2021), is a simpler variant that learns a single set of soft prompt embeddings at the input layer only. The paper's key empirical finding is that prompt tuning becomes increasingly competitive with full model tuning as model scale grows; at multi-billion parameter T5 scales, it matches model tuning while training only a tiny fraction of the parameters.[^25]
P-Tuning, by Xiao Liu and colleagues at Tsinghua University ("GPT Understands, Too"), learns continuous prompt embeddings interleaved with discrete prompt tokens through a small bidirectional prompt encoder. It stabilises sensitivity to prompt wording and improves performance on LAMA and SuperGLUE for both frozen and tuned base models.[^26]
Multitask Prompt Tuning and Llama-Adapter are additional soft-prompt-style methods supported by PEFT. Llama-Adapter, from Renrui Zhang and colleagues, prepends learnable adaption prompts to the upper layers of LLaMA and uses zero-initialised attention plus a learnable gating factor to introduce instruction-following behaviour without overwriting pretrained knowledge.[^5]
IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations), by Haokun Liu and colleagues at UNC ("Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning," 2022), introduces learned vectors that rescale activations in the keys, values, and feed-forward intermediate of each transformer block. The trainable parameters are extremely small compared to LoRA at equivalent rank, and the paper argues that few-shot PEFT outperforms in-context learning on the T-Few benchmark while costing far less compute.[^27] IA3 entered PEFT in v0.4.0 in July 2023.[^9]
OFT (Orthogonal Fine-Tuning) constrains the update to an orthogonal transformation of the base weights, preserving the hyperspherical energy (pairwise cosine similarity between neurons) of the pretrained model. The original OFT paper (arXiv:2306.07280) focuses on controlling text-to-image diffusion, where preserving subject identity is critical.[^28] OFT support was added in PEFT v0.7.0 in December 2023.[^8]
BOFT (Orthogonal Butterfly) by Weiyang Liu and colleagues (arXiv:2311.06243) generalises OFT by factoring the orthogonal transformation into a sequence of sparse butterfly matrices, inspired by the Cooley-Tukey FFT, yielding O(d log d) parameters while maintaining the orthogonality guarantee.[^29] BOFT shipped in PEFT v0.11.0 in May 2024.[^8]
HRA (Householder Reflection Adaptation), introduced in 2024, parameterises the update as a chain of r trainable Householder reflections. Because each Householder matrix is orthogonal, the chain remains orthogonal, and the construction simultaneously admits a low-rank interpretation.[^5]
LoftQ (LoRA-Fine-Tuning-aware Quantization), by Yixiao Li and colleagues (arXiv:2310.08659), jointly quantizes the base LLM and computes a low-rank initialisation for the LoRA adapters that compensates for quantization error. The construction reduces the performance gap between full fine-tuning and quantization-plus-LoRA on understanding, QA, summarisation, and generation tasks.[^30] PEFT exposes LoftQ via the replace_lora_weights_loftq helper added in v0.10.0.[^12]
| Method | Core idea | Original year | First PEFT version |
|---|---|---|---|
| LoRA[^7] | ∆W = BA with rank r | 2021 | v0.1.0 |
| Prefix Tuning[^24] | Per-layer learned prefix vectors | 2021 | v0.1.0 |
| Prompt Tuning[^25] | Input-layer soft prompts | 2021 | v0.1.0 |
| P-Tuning[^26] | Continuous prompts via prompt encoder | 2021 | v0.1.0 |
| IA3[^27] | Activation rescaling vectors | 2022 | v0.4.0 |
| QLoRA[^13] | 4-bit NF4 base plus LoRA | 2023 | v0.4.0 |
| AdaLoRA[^15] | SVD-form ∆W with importance pruning | 2023 | v0.3.0 |
| LoHa[^20] | Hadamard of low-rank pairs | 2021 (FedPara) | v0.6.0 |
| LoKr[^5] | Kronecker structure | 2023 | v0.6.0 |
| OFT[^28] | Orthogonal block-diagonal update | 2023 | v0.7.0 |
| LoftQ[^30] | Quantization-aware LoRA init | 2023 | v0.10.0 |
| DoRA[^14] | Magnitude + direction (LoRA) | 2024 | v0.9.0 |
| BOFT[^29] | Butterfly-factored orthogonal | 2023 | v0.11.0 |
| VeRA[^18] | Shared frozen B/A plus learned vectors | 2024 | v0.11.0 |
| PiSSA[^21] | SVD-based LoRA initialisation | 2024 | v0.11.0 |
| X-LoRA[^17] | Mixture of LoRA experts | 2024 | v0.12.0 |
| OLoRA[^22] | QR-based LoRA init | 2024 | v0.12.0 |
| FourierFT[^23] | Sparse DFT coefficients | 2024 | v0.12.0 |
| VB-LoRA[^19] | Shared global vector bank | 2024 | v0.13.0 |
The PEFT library exposes a small, regular surface that is intended to mirror the patterns of Hugging Face Transformers. The three central abstractions are PeftConfig, get_peft_model, and PeftModel.
PeftConfig is the abstract base class for all method-specific configuration objects. Each adapter method has a corresponding subclass: LoraConfig, IA3Config, PrefixTuningConfig, PromptTuningConfig, PromptEncoderConfig (for P-tuning), AdaLoraConfig, OFTConfig, BOFTConfig, VeRAConfig, LoHaConfig, LoKrConfig, and many more. These configurations carry the hyperparameters that define the adapter (such as rank r, scaling lora_alpha, dropout, target module patterns, and quantization-related flags) along with a task_type that selects the appropriate task head wrapper (causal LM, sequence-to-sequence LM, sequence classification, token classification, question answering, feature extraction).[^9]
The typical entry point is get_peft_model(model, peft_config), which takes a Hugging Face model loaded via AutoModel...from_pretrained and a config object, and returns a PeftModel wrapping the base model with the adapter layers inserted at the appropriate target modules.[^1] PeftModel is the base class encompassing the various PEFT methods; it inherits from torch.nn.Module and stores the base model, the peft configuration, the list of modules to save, and (for soft-prompt methods) a prompt_encoder and the virtual prompt tokens.[^31]
PeftModel exposes:
save_pretrained(path) and from_pretrained(model, path) to save and reload the adapter weights only, producing the megabyte-scale checkpoints noted in the launch announcement.[^1]print_trainable_parameters() to report how many parameters are trainable as both an absolute count and a percentage of the base model.[^1]merge_and_unload() to fold the LoRA delta into the base weights and return a pure base model, removing any inference overhead from adapter computation.[^10]add_adapter, set_adapter, disable_adapter, delete_adapter, enable_adapters, and disable_adapters for managing multiple adapters on a single base model.[^10]add_weighted_adapter(adapters, weights, adapter_name, combination_type) to merge multiple LoRAs into a new adapter, with merging strategies including a plain linear combination, SVD-based concatenation, and the TIES, DARE, and Magnitude Prune strategies added in v0.9.0.[^11]PEFT v0.4.0 introduced the AutoPeftModel family (AutoPeftModelForCausalLM, AutoPeftModelForSeq2SeqLM, AutoPeftModelForSequenceClassification, AutoPeftModelForTokenClassification, AutoPeftModelForQuestionAnswering, AutoPeftModelForFeatureExtraction). Calling AutoPeftModelForCausalLM.from_pretrained(adapter_dir) reads the adapter config, identifies the base model and task type, downloads or loads the base, and applies the adapter in a single call.[^9]
PEFT integrates with multiple quantization backends rather than implementing quantization itself. The bitsandbytes 4-bit (NF4 and FP4) and 8-bit paths underpin QLoRA-style training. From v0.5.0, PEFT supports applying LoRA on top of GPTQ-quantized models. v0.9.0 added AutoAWQ (4-bit) and AQLM (down to 2-bit). v0.11.0 added HQQ and EETQ. Many of these backends do not support merge_and_unload(), because merging the LoRA delta into a quantized matrix would degrade numerical fidelity.[^8][^11]
A pivotal milestone in PEFT's adoption was the November 2023 v0.6.0 release, in which the Diffusers library switched from its own LoRA loader to PEFT as the backend for LoRA adapter management. The announcement explicitly stated: "Diffusers now uses PEFT," and listed the new user-visible capabilities unlocked by the change.[^10]
The integration brought several previously fragmented capabilities under a single API: simultaneous use of multiple LoRA adapters on the same diffusion pipeline, instant switching between adapters at inference time, scalar scaling and combination of adapters, permanent merging and unmerging into the base UNet or text encoder, and on-the-fly enable or disable controls. The implementation reads both Diffusers-native LoRA checkpoint formats and Kohya-style formats popularised by community fine-tuners for Stable Diffusion derivatives, ensuring interoperability with the very large pool of community-uploaded LoRAs on the Hub.[^10] PEFT v0.10.0 in March 2024 added mixed-adapter batches, letting different samples in the same batch use different adapters via syntax such as adapter_names=["adapter1", "adapter2", "__base__"].[^12]
The downstream impact has been visible in the diffusion community: a Hugging Face engineering blog post from July 2025 observed that the Flux family of community diffusion models alone had accumulated over 30,000 LoRA adapters on the Hub trained against a single base, all of which are loadable through the PEFT-backed Diffusers pipeline.[^32]
A recurring theme in the release notes from v0.10.0 onward is making large-scale fine-tuning approachable on commodity multi-GPU systems. v0.10.0 added support for combining QLoRA with both DeepSpeed ZeRO-3 and PyTorch FSDP, with the stated goal of fine-tuning a 70B LLaMA-2 model on two GPUs each with 24 GB of memory. The integration required coordinating updates across bitsandbytes, transformers, and FSDP to handle 4-bit parameter sharding and offloading correctly.[^12] v0.10.0 also added layer replication for LoRA, which duplicates layers with the LoRA adapters applied while keeping the base weights shared, providing a memory-efficient way to grow effective model depth.[^12]
The library uses prepare_model_for_kbit_training to set up frozen quantized models for stable PEFT training (replacing the older prepare_model_for_int8_training, which was removed in v0.10.0).[^12] The mixed-adapter-batch feature in v0.10.0 enables per-sample adapter selection, which is useful for evaluation harnesses and routing scenarios.[^12]
PEFT became, in the words of its own ecosystem partners, the de facto adapter standard within months of release. Several adoption indicators are visible in public data.
huggingface/peft has approximately 21,200 stars and 2,300 forks, with an active commit history (pushed_at 20 May 2026) and an ongoing release cadence.[^8]peft package has been on PyPI continuously since 19 January 2023.[^33]The library has also become a teaching reference: official Hugging Face course material, the Smol Course, and external curricula consistently use PEFT to demonstrate parameter-efficient fine-tuning of LLaMA-, Mistral-, and T5-class models.[^2][^3]
PEFT's significance lies less in any single algorithmic contribution and more in standardisation. Before PEFT, each adapter method (LoRA, prefix tuning, IA3, OFT, and so on) was distributed as a separate research codebase, often tightly coupled to a specific model architecture and training stack. The PEFT API makes the cost of switching between methods small: a practitioner can change peft_config from LoraConfig(r=8, ...) to IA3Config(...) or BOFTConfig(...) without touching the surrounding training loop.[^2] This has accelerated empirical comparisons across methods and lowered the engineering cost of putting parameter-efficient fine-tuning into production pipelines.
For supervised fine-tuning of large language models on consumer hardware, the LoRA-plus-QLoRA path through PEFT remains the most common recipe in open-source post-training as of 2026.[^13][^9] For diffusion adapters, PEFT supplies the loading, merging, and scaling primitives that the broader image-generation community depends on.[^10] For research, PEFT functions as a reference implementation that authors of new adapter papers target when they want adoption; the time between paper release and an official LoraXxxConfig or equivalent merging into main is often measured in weeks.[^5]
PEFT is not a one-size-fits-all replacement for full fine-tuning, and the library's documentation and method-specific papers identify several real limitations.
Performance gap on hard tasks. Several papers (notably DoRA and PiSSA) frame themselves explicitly as closing the accuracy gap between LoRA and full fine-tuning, an admission that vanilla LoRA can underperform on some tasks. PiSSA's authors describe LoRA's noise-and-zero adapter initialisation as a likely cause of slow convergence and lower final accuracy.[^14][^21] DoRA's authors document accuracy gaps for LoRA on commonsense reasoning and visual instruction tuning that DoRA narrows but does not fully close at all ranks.[^14]
Quantization-merge limitations. PEFT can apply LoRA on top of GPTQ, AutoAWQ, AQLM, HQQ, and EETQ quantized models, but in most cases merge_and_unload() is not supported on these quantized backends because materialising the merged delta would lose precision. This means the inference-time overhead of LoRA cannot always be eliminated when the base is quantized to very low precision.[^11]
Catastrophic forgetting can still occur. Although freezing the base model reduces the risk of forgetting, large-rank LoRA on many target modules trained for long schedules can still degrade base-model capabilities; the launch blog and method papers describe this as a tuning choice rather than a guarantee.[^1]
Method proliferation. The library's choice to ship many adapter families (LoRA, LoHa, LoKr, OFT, BOFT, OLoRA, PiSSA, VeRA, VB-LoRA, FourierFT, HRA, X-LoRA, DoRA, MiSS, RoAd, ALoRA, Arrow, WaveFT, DeLoRA, OSF, and more) presents real selection difficulty for new users. The official conceptual guide attempts to triage by recommending LoRA as the default starting point, but in practice picking the right method, rank, and target modules remains an empirical exercise.[^5]
Breaking changes during evolution. Some releases introduce required interface changes (for example, v0.9.0 mandated an update_layer method on custom adapter layer implementations and disallowed rank-zero configurations, while v0.10.0 removed prepare_model_for_int8_training). These changes are documented but can break downstream training scripts that pin loose version ranges.[^11][^12]
PEFT sits alongside several complementary projects in the Hugging Face stack and the broader open-source ecosystem.
AutoModelForCausalLM.from_pretrained(...) or similar.[^2]PeftModel to keep memory low during alignment phases.[^2]microsoft/LoRA) remains a primary citation for the algorithm but is no longer the dominant production implementation; PEFT supersedes it for most workflows.[^7]