LLM Compiler (Meta)
Last reviewed
Jun 3, 2026
Sources
4 citations
Review status
Source-backed
Revision
v1 · 1,414 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
4 citations
Review status
Source-backed
Revision
v1 · 1,414 words
Add missing citations, update stale details, or suggest a clearer explanation.
The Meta Large Language Model Compiler, usually shortened to LLM Compiler, is a family of pre-trained large language model models built by Meta AI for code and compiler optimization tasks. It was introduced in the paper "Meta Large Language Model Compiler: Foundation Models of Compiler Optimization" by Chris Cummins, Volker Seeker, Dejan Grubisic, Baptiste Roziere, Jonas Gehring, Gabriel Synnaeve, and Hugh Leather, posted to arXiv on 27 June 2024, with model weights released the same day on Hugging Face. [1][2]
LLM Compiler is built on top of Code Llama, Meta's code-specialized derivative of LLaMA, and is then further trained on a large corpus of compiler intermediate representations (notably LLVM-IR) and assembly code so that the model learns the semantics of compiler internals rather than only high-level source languages. The stated motivation was that publicly available coding models had seen little training aimed specifically at optimizing code, and that the GPU hours and data collection needed to train such models from scratch are often prohibitive for academic and industry researchers. By releasing pre-trained foundation models, the authors aimed to give the field a starting point that can be fine-tuned for downstream compiler tasks with comparatively little data. [1][3]
Meta released the models in two parameter sizes, 7 billion and 13 billion, and in two flavors. The base LLM Compiler models are the foundation models trained to understand compiler IRs and to emulate the compiler. The LLM Compiler FTD models are versions fine-tuned for two specific downstream tasks: optimization flag tuning and disassembly. All four checkpoints were published on Hugging Face under the facebook/ namespace. Each model is an autoregressive transformer with a context window of 16,000 tokens, initialized from the Code Llama weights of the corresponding size. [2][4]
| Model | Sizes | Type | Purpose |
|---|---|---|---|
| LLM Compiler | 7B, 13B | Foundation | Understanding compiler IR and assembly; emulating optimization passes |
| LLM Compiler FTD | 7B, 13B | Fine-tuned | Predicting code-size optimization flags; disassembling x86-64 / ARM assembly to LLVM-IR |
The "FTD" suffix denotes the fine-tuned downstream variants. Meta's model card metadata at times expanded the abbreviation as both "fine-tuned for downstream tasks" and "fine-tuned for disassembly"; in the paper itself, the FTD models are consistently described as the variants additionally trained for the flag-tuning and disassembly tasks. [2][4]
The base models were initialized from Code Llama (using the same underlying data as Code Llama, with different weights) and then trained on an additional 546 billion tokens of compiler-centric data. Counting the further FTD fine-tuning, the full pipeline uses 710 billion training tokens. Meta described the process as four stages, with 15% of data from the previous tasks retained at each stage to preserve earlier capabilities: [1][2]
| Stage | Output | Additional tokens |
|---|---|---|
| 1. Assembly and compiler IR training | (toward base model) | 401 billion |
| 2. Compiler emulation fine-tuning | LLM Compiler (7B, 13B) | 145 billion |
| 3. Flag tuning fine-tuning | (toward FTD model) | 84 billion |
| 4. Disassembly fine-tuning | LLM Compiler FTD (7B, 13B) | 80 billion |
Stages 1 and 2 sum to the 546 billion tokens behind the base models; stages 3 and 4 add a further 164 billion tokens for the FTD models, for the 710 billion total. The initial IR and assembly corpus was generated with LLVM version 17.0.6 and split roughly evenly between LLVM-IR (about 185 billion tokens) and assembly (about 216 billion tokens). It targets predominantly x86-64 (around 340 billion tokens), with a smaller share of 64-bit ARM (around 61 billion tokens) and a small amount of CUDA. [1]
For context on the underlying cost the release was meant to spare, the paper notes that training Code Llama's models consumed roughly 1.4 million A100 GPU hours. [1]
LLM Compiler is designed around three compiler-oriented tasks, all expressed as text in and text out so that the model can read and write code and IR directly.
Compiler emulation. The base models are trained to predict the output of the LLVM optimizer (opt) given an input program and a list of optimization passes, in effect emulating what the compiler would do. The emulation dataset was built by applying randomly chosen lists of between 1 and 50 passes (sampled from a set of 167 passes) to unoptimized programs; pass lists that crashed the compiler or timed out after 120 seconds were excluded. [1]
Optimization flag tuning. The FTD models are given an unoptimized LLVM-IR module (as emitted by the clang frontend) and asked to produce a list of opt flags that minimizes the resulting object-code size, along with the predicted binary sizes. This is evaluated zero-shot on unseen programs. To generate training labels, Meta ran a large autotuning search using random search over pass lists, validated for correctness with a tool called PassListEval against 164 self-testing C++ programs from HumanEval-X. That autotuning search achieved a geometric mean 7.1% reduction in binary size over -Oz, but at a cost of more than 21,000 CPU days of additional compilation; the goal of the model is to recover a useful fraction of that benefit without running the compiler thousands of times per program. [1]
Disassembly. The FTD models are trained to generate LLVM-IR from a piece of x86-64 or ARM assembly code, a form of lifting that can support recompilation and analysis of code where source is unavailable. Round-trip quality is measured by re-assembling the produced IR and comparing it against the original. [1][2]
On the flag-tuning task, the paper reports that LLM Compiler FTD reaches 77% of the optimizing potential of the autotuning search it was trained against, without the need for additional compilations at inference time. The 13B FTD model produced smaller object files than -Oz in 61% of cases. On disassembly, LLM Compiler FTD achieves a 45% round-trip success rate with a 14% exact match. Meta reports that on both downstream tasks the FTD models significantly outperform the general-purpose GPT-4 Turbo and Code Llama Instruct. [1][2]
Selected figures from the Hugging Face model card illustrate the gap on compiler emulation and on the two FTD tasks:
| Model | Size | Compiler emulation accuracy | Code-size improvement vs -Oz | Disassembly round-trip BLEU |
|---|---|---|---|---|
| GPT-4 Turbo | N/A | N/A | -0.01% | 0.43 |
| Code Llama Instruct | 7B | 1.2% | -0.49% | 0.48 |
| Code Llama Instruct | 13B | 0.8% | -0.42% | 0.62 |
| LLM Compiler | 7B | 16% | N/A | N/A |
| LLM Compiler | 13B | 20% | N/A | N/A |
| LLM Compiler FTD | 7B | N/A | 4.77% | 0.95 |
| LLM Compiler FTD | 13B | N/A | 4.88% | 0.96 |
In the table a negative code-size improvement means the model's chosen flags produced a larger binary than the default -Oz baseline. [2]
The models were released on 27 June 2024 under a bespoke license titled the "Meta Large Language Model Compiler License Agreement." Meta described it as a permissive community license intended to allow wide reuse, free for both research and commercial use, granting a non-exclusive, worldwide, non-transferable, royalty-free limited license to use, reproduce, and modify the models. As with Meta's Llama family, the agreement attaches an acceptable-use policy and requires a separate license from Meta for entities whose products exceed 700 million monthly active users; it also asks downstream users to attribute the work (for example, "Built with LLM Compiler") and to prefix the names of derived models with "LLM Compiler." [2][3]
The authors framed the release as a foundation for further research, intended to lower the barrier for academic and industry practitioners working on machine-learning-guided compiler optimization. The work builds on Meta's earlier research in this area, including the CompilerGym reinforcement-learning environment and graph-based program representations cited in the paper. [1][3]