Axolotl
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,122 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,122 words
Add missing citations, update stale details, or suggest a clearer explanation.
Axolotl is an open source fine-tuning framework for large language models written in Python and configured through YAML files. Originally created by Wing Lian in 2023 under the OpenAccess-AI-Collective banner, the project is now stewarded by Axolotl AI Inc., a venture-backed company founded in 2024 that maintains the framework alongside a community of contributors[1][2]. The framework wraps Hugging Face Transformers, the TRL library, Accelerate, DeepSpeed, and PEFT into a single declarative configuration layer that supports supervised fine-tuning, preference optimization (DPO, IPO, KTO, ORPO, GRPO, GDPO), parameter-efficient methods such as LoRA and QLoRA, full multi-GPU fine-tunes via FSDP and DeepSpeed, and high-throughput training tricks including sample packing (Multipack) with FlashAttention[3][4]. Released under the Apache 2.0 license, Axolotl has been used to train many of the most widely downloaded open chat models on Hugging Face, including the OpenHermes series from Teknium and the Dolphin series from Cognitive Computations[5][6].
| Attribute | Value |
|---|---|
| Project type | Open source LLM fine-tuning framework |
| Primary language | Python |
| Configuration format | YAML |
| License | Apache 2.0[3] |
| Original author | Wing Lian (caseus)[1] |
| Original organization | OpenAccess-AI-Collective[7] |
| Current organization | axolotl-ai-cloud / Axolotl AI Inc.[3] |
| Initial public release | 2023[1] |
| Latest release (as of writing) | v0.16.1, April 2, 2026[8] |
| Notable funding | Andreessen Horowitz Open Source AI grant, December 2023[2] |
Axolotl began as a personal project by Wing Lian, a software engineer with prior experience at SoundCloud and UnitedMasters. In March 2023, a skiing injury left Lian sidelined and looking for something to occupy his recovery; he chose to learn LLM fine-tuning, which had become a fast-moving research topic after the release of Meta's LLaMA weights and Stanford's Alpaca instruction-tuning recipe[1]. While experimenting with existing tools such as Alpaca-LoRA, he ran into two recurring frustrations: prompt formats varied wildly across the datasets being shared on Hugging Face, and the dominant training scripts were configured through long command-line argument lists that made experiments hard to reproduce and share[1].
To solve both problems, Lian wrote a wrapper that consumed a single YAML configuration file describing the model, dataset, prompt template, optimizer, and distributed training strategy. The wrapper validated the configuration up front, then handed off to Hugging Face Transformers and Accelerate for the actual training loop. The project was released on GitHub under the OpenAccess-AI-Collective organization in mid-2023 and quickly accumulated contributors[7][1].
A turning point came with Tim Dettmers's QLoRA paper in May 2023. Lian integrated QLoRA support into Axolotl within roughly a week of the paper's release, giving practitioners with consumer-grade hardware an immediate way to fine-tune 7B and even 70B parameter models in 4-bit precision[1]. This pattern (rapid integration of new research) became a signature trait of the project. When Tri Dao and Albert Gu's Mamba state space model paper appeared in December 2023, Axolotl shipped support for fine-tuning Mamba checkpoints within days[1].
Through the second half of 2023, Axolotl became the de facto fine-tuning framework for the open-weights community that grew up around LLaMA, Mistral, and the larger Hugging Face ecosystem. Teknium chose Axolotl for the OpenHermes line of Mistral fine-tunes; the OpenHermes-2.5-Mistral-7B model card explicitly notes that datasets were "converted to ShareGPT" and "further transformed by Axolotl to use ChatML"[5]. Eric Hartford's Dolphin series, including dolphin-2.5-mixtral-8x7b and dolphin-2.8-mistral-7b-v0.2, displayed the "Built with Axolotl" badge on their model cards and shipped the full Axolotl YAML configuration alongside the weights so others could reproduce the training[6]. Nous Research used Axolotl for several of its Capybara, Puffin, and Hermes derivatives, and Lian himself trained models such as Manticore, Minotaur, Jackalope, and Hippogriff that lived under the openaccess-ai-collective namespace on Hugging Face[1].
On December 13, 2023, Andreessen Horowitz announced a second batch of Open Source AI Grants. Axolotl appeared on that list alongside six other projects spanning model training, hosting, evaluation, and visual AI, recognizing the framework as a piece of critical open infrastructure for the LLM ecosystem. The a16z program provides grant funding rather than equity, so the award was not a venture investment[2].
Through 2024 the project transitioned from a hobbyist tool maintained under OpenAccess-AI-Collective into a company. The GitHub organization was renamed and the canonical repository moved to axolotl-ai-cloud/axolotl, with the company adopting axolotl.ai as its domain and docs.axolotl.ai for documentation[3][4]. Wing Lian announced the company publicly at a Nous Research meetup hosted at the a16z offices in San Francisco[1]. Public company databases describe Axolotl AI as a San Francisco company founded in 2024 that builds open source tools for customizing and scaling AI language models, with Essence Venture Capital listed among its investors[9].
Lian represented the new company at the PyTorch Conference 2024 Fine-Tuning Mini-Summit on September 18, 2024, giving a talk titled "The Challenges of Building an Opinionated Open Source LLM Framework" alongside the maintainers of Unsloth, torchtune, and researchers including Tim Dettmers[10].
The framework's release cadence has been steady. Version 0.12.0 (August 8, 2024) introduced N-D parallel support, DeepSpeed Automatic Tensor Parallelism, and FP8 training. Subsequent 2025 releases added reward modeling and process reward modeling, LoRA optimizations, and a beta for multimodal vision-language fine-tuning, with January 2025 specifically delivering reward and process reward modeling and February 2025 shipping the LoRA memory and speed work that targeted both single-GPU and multi-GPU adapter training. In early 2026, version 0.15.0 (March 6) shipped a Torch 2.10 upgrade, uv-based Docker builds, ScatterMoE LoRA, SonicMoE Triton kernels, and MoE expert quantization that the maintainers reported as reducing peak reserved memory dramatically on mixture-of-experts models. Version 0.16.0 (April 2) added asynchronous GRPO training reported as up to 58% faster step-time, ScatterMoE/SonicMoE fused kernels claimed to deliver up to 15x faster MoE forward passes and roughly 40x reductions in memory, FlashAttention 4 support for NVIDIA Hopper and Blackwell GPUs, NeMo Gym integration for reinforcement learning, and Energy-Based Fine-Tuning (EBFT). Version 0.16.1 followed the same day with Gemma 4 support[8]. The roughly monthly cadence and the public release notes make it possible for downstream teams to track which research methods and model families are stable in production versus still considered beta[4][8].
The defining design decision in Axolotl is that an entire training run, from data preprocessing through final inference, is captured in one reusable YAML file. The configuration declares the base model, optional adapter strategy, dataset paths and templates, sequence length, batch and gradient accumulation parameters, optimizer and scheduler, distributed training backend, attention implementation, and downstream evaluation hooks. The file is parsed and statically validated; incompatible parameter combinations (for example, sample packing without an attention implementation that supports it) fail the lint step before any GPU time is spent[1][4].
The CLI exposes a small set of commands that all consume the same configuration: axolotl preprocess tokenizes and caches the dataset, axolotl train runs the training loop, axolotl inference provides an interactive prompt, and axolotl evaluate runs offline evaluation. Most users interact with Axolotl entirely through these commands and a single YAML[4].
Internally, Axolotl is a relatively thin coordination layer over a stack of mature libraries:
Trainer API[4].This approach keeps Axolotl close to the moving frontier of upstream libraries while concentrating the project's own code on what it actually owns: configuration schema, dataset format adapters, sample-packing logic, validation rules, and the integration glue.
Sample packing, called Multipack in the Axolotl documentation, is the framework's headline throughput optimization. The naive approach to batching variable-length sequences pads each sequence in a batch up to the longest sequence in that batch, wasting compute on padding tokens that the model is forced to process but which contribute nothing to the loss. Multipack instead concatenates multiple short sequences into a single packed sequence whose length matches the configured sequence_len, then relies on the attention implementation to prevent tokens in one packed example from attending to tokens in another[12].
With Flash Attention enabled, Multipack passes per-sequence boundary information so that FlashAttention's variable-length kernels compute attention only within each original sequence. Without FlashAttention, Axolotl can still pack sequences by constructing 4D attention masks for PyTorch's scaled dot-product or native attention paths, though at lower efficiency because the framework cannot join multiple batches into a single batch without the variable-length attention support that FlashAttention provides[12]. Lian has reported that the combination of sample packing and FlashAttention drives roughly an order-of-magnitude improvement in tokens-per-second relative to padded training, and gave an illustrative figure of reproducing an Alpaca-style fine-tune for roughly $4 to $5 on L40 GPUs versus the original Alpaca team's $100 on 8x A100s[1]. The packing scheme is effectively a descendant of StackLlama-style sequence concatenation but with attention masking that preserves the per-sample loss exactly, so models trained with Multipack are mathematically equivalent to those trained without it given the same hyperparameters and data ordering[1][12].
Beyond sample packing, Axolotl exposes a long list of optional optimizations[13]:
Axolotl supports single-GPU, multi-GPU, and multi-node training through Accelerate launchers. For sharded training the user picks between DeepSpeed ZeRO stages (commonly ZeRO-2 or ZeRO-3 with optional BF16) and Fully Sharded Data Parallel (FSDP) (both the original implementation and the newer FSDP2 rewrite)[3]. Recent releases extend this with N-D parallelism that composes tensor, context, and FSDP sharding, sequence parallelism for very long contexts, and DeepSpeed Auto Tensor Parallelism introduced in v0.12.0[8].
For preference optimization and RL methods that need rollouts, Axolotl integrates with vLLM for fast inference during trajectory generation in GRPO and GDPO, and provides async training paths that overlap rollout generation with gradient updates[11][8].
Axolotl supports a broad menu of training objectives, all selected from the same YAML[11][3]:
| Family | Methods | Notes |
|---|---|---|
| Supervised fine-tuning | Standard SFT, Instruction Tuning, continued pretraining | The default mode; ChatML, Alpaca, ShareGPT, Vicuna, and template-free formats are all supported. |
| Preference / Reinforcement Learning from Human Feedback (RLHF) | DPO, IPO, KTO, ORPO, SimPO, GDPO | DPO compares chosen vs. rejected; IPO is a DPO loss variant; KTO uses desirable/undesirable single-response signals; ORPO adds an odds-ratio term; SimPO removes the reference model; GDPO normalizes multiple reward signals. |
| RL with policy optimization | GRPO, Async GRPO | Group Relative Policy Optimization with vLLM for trajectory generation, custom reward functions, and async pipelines. |
| Reward modeling | Reward Modeling, Process Reward Modeling | Added in early 2025 for training scalar reward and step-level process reward models. |
| Parameter-efficient | LoRA, QLoRA, ReLoRA, ScatterMoE LoRA | QLoRA pairs with bitsandbytes 4-bit quantization; ScatterMoE LoRA targets MoE expert weights. |
| Quantization-aware | QAT, NVFP4 QAT, GPTQ | Train models to be robust to low-precision inference. |
| Energy-based | EBFT (Energy-Based Fine-Tuning) | Introduced in v0.16.0 as a novel RL method. |
The library also supports multimodal vision-language fine-tunes (Qwen2-VL, Qwen2.5-VL, Qwen3-VL, LLaVA, SmolVLM2, and InternVL families) and audio models such as Voxtral, with multimodal SFT moving from beta into stable status during 2025[3][4].
Axolotl tracks the upstream model zoo aggressively, typically adding configurations within days of a major open-weights release[3][4]:
The 2025 to 2026 releases broadened coverage further with multimodal vision and audio models, and Lian has emphasized in interviews and conference talks that adding a new architecture typically means writing a model wrapper plus example YAMLs rather than reimplementing the base model, since the heavy lifting lives in Transformers[4][10].
Axolotl's dataset layer is one of the parts most directly written by the project itself. The framework natively understands several chat and instruction formats and converts them into the prompt template required by the target model[4][14]:
The fact that Dolphin and OpenHermes consistently distributed their training data in ShareGPT/ChatML and pointed users at Axolotl as the canonical trainer is a significant reason both formats became standard in the open-weights community[5][6].
Axolotl's footprint on the Hugging Face Hub is broad and visible. The "Built with Axolotl" badge and accompanying YAML appear on the model cards of many of the highest-download open chat models from 2023 onward[5][6]. Notable examples include:
| Model series | Maintainer | Base model | Role of Axolotl |
|---|---|---|---|
| OpenHermes 2 / 2.5 | Teknium | Mistral 7B | SFT with ChatML conversion via Axolotl; the OpenHermes-2.5-Mistral-7B card explicitly documents this[5]. |
| Dolphin 2.5 | Cognitive Computations | Mixtral 8x7B | qLoRA fine-tune with Axolotl on Mixtral[15]. |
| Dolphin 2.6 | Cognitive Computations | Mistral 7B | qLoRA fine-tune, reported as 2 days on 4x A100s[1]. |
| Dolphin 2.8 v0.2 | Cognitive Computations | Mistral 7B v0.2 | Full SFT with sample packing, DeepSpeed ZeRO-3, Flash Attention, 16k sequence length on 10x L40S over 3 days[6]. |
| Capybara, Puffin, Hermes derivatives | Nous Research | Llama 2, Mistral | SFT and DPO via Axolotl[1]. |
| Mistral-OpenOrca | OpenOrca/OpenChat | Mistral 7B | Axolotl-based SFT on Mistral-7B[1]. |
| Manticore, Minotaur, Jackalope, Hippogriff | OpenAccess-AI-Collective (Wing Lian) | Various | Axolotl reference models maintained alongside the framework[1][7]. |
| Mythalion, DiscoLM | Pygmalion, DiscoResearch | Llama 2 derivatives | Axolotl-based community releases[1]. |
Cloud platforms tailored their environments to Axolotl in response: RunPod and Vast.ai both offer Axolotl Docker images, and Modal (platform) and Replicate publish example notebooks and templates for running Axolotl jobs[4][3]. Recent releases also publish documentation tuned for AI coding assistants such as Claude Code, Cursor, and Copilot, reflecting how heavily contemporary users mix LLM-generated code with hand-edited configuration[3].
The official site lists 170+ contributors and 500+ active Discord members; the GitHub repository displays around 11.9k stars and 1.3k forks as of writing[3][16].
Axolotl is the most prominent member of a small set of open source LLM fine-tuning frameworks that emerged in 2023 to 2024. The three most often compared are Axolotl itself, Unsloth, and LLaMA-Factory[17][18][19].
| Dimension | Axolotl | Unsloth | LLaMA-Factory |
|---|---|---|---|
| Configuration model | YAML files, declarative[4] | Python-first API with notebooks[17] | YAML + a polished web UI[17] |
| Primary differentiator | Extensive feature surface, distributed training (FSDP, DeepSpeed, N-D parallel), MoE kernels[3][17] | Hand-written Triton kernels delivering 2 to 5x speedups and large memory reductions on single GPU[17] | Breadth of model support and a low-friction web UI for non-engineers[17] |
| Multi-GPU / multi-node | Strong; FSDP1/2, DeepSpeed, sequence parallel, ND parallel[3] | Historically single-GPU focused; multi-GPU support has expanded[17] | Supported, often by delegating to DeepSpeed[17] |
| Sample packing | Multipack with FlashAttention is a core feature[12] | Supported | Supported, can use Unsloth as an acceleration backend[17] |
| Preference / RL methods | SFT, DPO, IPO, KTO, ORPO, SimPO, GDPO, GRPO, reward modeling, EBFT[11] | DPO, GRPO, and others[17] | DPO, GRPO, ORPO, and others[17] |
| Typical user | Engineering teams running production training and research-style ablations[17] | Individual developers and resource-constrained setups[17] | Cross-functional teams that want a web UI[17] |
| License | Apache 2.0[3] | Apache 2.0[17] | Apache 2.0[17] |
By 2026, all three frameworks support the same core menu of objectives (LoRA, QLoRA, full fine-tuning, DPO, GRPO, multimodal) and the practical differences are mostly in workflow ergonomics and distributed training depth rather than capability[19][20]. Axolotl is generally positioned as the framework for ML engineering teams that need reproducible YAML configs, multi-node training, and the latest research methods landed quickly[17][18]; Unsloth as the choice for individual practitioners who need maximum throughput from a single GPU through custom kernels[17]; and LLaMA-Factory as the one with the most approachable UI for people who do not want to write Python[17].
Axolotl is also commonly contrasted with PyTorch's own torchtune library, which is more conservative in feature scope but tightly integrated with the PyTorch core[10][20].
Axolotl's significance comes less from a single algorithmic innovation than from being the connective tissue that made it practical for hobbyists, researchers, and small teams to fine-tune open-weights LLMs at modern scale. Concretely, it enables:
In commercial terms, the framework's existence has been a meaningful contributor to the viability of the open-weights ecosystem: if fine-tuning required bespoke engineering, far fewer of the Mistral and Llama derivatives that populate Hugging Face would exist.
Axolotl's design choices come with trade-offs that practitioners frequently surface in community discussion[17][18][19]:
Axolotl sits at the intersection of several ecosystems that are worth navigating in their own right: