LLM Compiler (Meta)

AI Code Generation Large Language Models Meta AI

7 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

4 citations

Revision

v2 · 1,410 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

The Meta Large Language Model Compiler, usually shortened to LLM Compiler, is a family of pre-trained large language model models built by Meta AI for code and compiler optimization tasks. It was introduced in the paper "Meta Large Language Model Compiler: Foundation Models of Compiler Optimization" by Chris Cummins, Volker Seeker, Dejan Grubisic, Baptiste Roziere, Jonas Gehring, Gabriel Synnaeve, and Hugh Leather, posted to arXiv on 27 June 2024, with model weights released the same day on Hugging Face. ^[1]^[2]

LLM Compiler is built on top of Code Llama, Meta's code-specialized derivative of LLaMA, and is then further trained on a large corpus of compiler intermediate representations (notably LLVM-IR) and assembly code so that the model learns the semantics of compiler internals rather than only high-level source languages. The stated motivation was that publicly available coding models had seen little training aimed specifically at optimizing code, and that the GPU hours and data collection needed to train such models from scratch are often prohibitive for academic and industry researchers. By releasing pre-trained foundation models, the authors aimed to give the field a starting point that can be fine-tuned for downstream compiler tasks with comparatively little data. ^[1]^[3]

Models and sizes

Meta released the models in two parameter sizes, 7 billion and 13 billion, and in two flavors. The base LLM Compiler models are the foundation models trained to understand compiler IRs and to emulate the compiler. The LLM Compiler FTD models are versions fine-tuned for two specific downstream tasks: optimization flag tuning and disassembly. All four checkpoints were published on Hugging Face under the facebook/ namespace. Each model is an autoregressive transformer with a context window of 16,000 tokens, initialized from the Code Llama weights of the corresponding size. ^[2]^[4]

Model	Sizes	Type	Purpose
LLM Compiler	7B, 13B	Foundation	Understanding compiler IR and assembly; emulating optimization passes
LLM Compiler FTD	7B, 13B	Fine-tuned	Predicting code-size optimization flags; disassembling x86-64 / ARM assembly to LLVM-IR

The "FTD" suffix denotes the fine-tuned downstream variants. Meta's model card metadata at times expanded the abbreviation as both "fine-tuned for downstream tasks" and "fine-tuned for disassembly"; in the paper itself, the FTD models are consistently described as the variants additionally trained for the flag-tuning and disassembly tasks. ^[2]^[4]

Training data and pipeline

The base models were initialized from Code Llama (using the same underlying data as Code Llama, with different weights) and then trained on an additional 546 billion tokens of compiler-centric data. Counting the further FTD fine-tuning, the full pipeline uses 710 billion training tokens. Meta described the process as four stages, with 15% of data from the previous tasks retained at each stage to preserve earlier capabilities: ^[1]^[2]

Stage	Output	Additional tokens
1. Assembly and compiler IR training	(toward base model)	401 billion
2. Compiler emulation fine-tuning	LLM Compiler (7B, 13B)	145 billion
3. Flag tuning fine-tuning	(toward FTD model)	84 billion
4. Disassembly fine-tuning	LLM Compiler FTD (7B, 13B)	80 billion

Stages 1 and 2 sum to the 546 billion tokens behind the base models; stages 3 and 4 add a further 164 billion tokens for the FTD models, for the 710 billion total. The initial IR and assembly corpus was generated with LLVM version 17.0.6 and split roughly evenly between LLVM-IR (about 185 billion tokens) and assembly (about 216 billion tokens). It targets predominantly x86-64 (around 340 billion tokens), with a smaller share of 64-bit ARM (around 61 billion tokens) and a small amount of CUDA. ^[1]

For context on the underlying cost the release was meant to spare, the paper notes that training Code Llama's models consumed roughly 1.4 million A100 GPU hours. ^[1]

Tasks

LLM Compiler is designed around three compiler-oriented tasks, all expressed as text in and text out so that the model can read and write code and IR directly.

Compiler emulation. The base models are trained to predict the output of the LLVM optimizer (opt) given an input program and a list of optimization passes, in effect emulating what the compiler would do. The emulation dataset was built by applying randomly chosen lists of between 1 and 50 passes (sampled from a set of 167 passes) to unoptimized programs; pass lists that crashed the compiler or timed out after 120 seconds were excluded. ^[1]

Optimization flag tuning. The FTD models are given an unoptimized LLVM-IR module (as emitted by the clang frontend) and asked to produce a list of opt flags that minimizes the resulting object-code size, along with the predicted binary sizes. This is evaluated zero-shot on unseen programs. To generate training labels, Meta ran a large autotuning search using random search over pass lists, validated for correctness with a tool called PassListEval against 164 self-testing C++ programs from HumanEval-X. That autotuning search achieved a geometric mean 7.1% reduction in binary size over -Oz, but at a cost of more than 21,000 CPU days of additional compilation; the goal of the model is to recover a useful fraction of that benefit without running the compiler thousands of times per program. ^[1]

Disassembly. The FTD models are trained to generate LLVM-IR from a piece of x86-64 or ARM assembly code, a form of lifting that can support recompilation and analysis of code where source is unavailable. Round-trip quality is measured by re-assembling the produced IR and comparing it against the original. ^[1]^[2]

Reported results

On the flag-tuning task, the paper reports that LLM Compiler FTD reaches 77% of the optimizing potential of the autotuning search it was trained against, without the need for additional compilations at inference time. The 13B FTD model produced smaller object files than -Oz in 61% of cases. On disassembly, LLM Compiler FTD achieves a 45% round-trip success rate with a 14% exact match. Meta reports that on both downstream tasks the FTD models significantly outperform the general-purpose GPT-4 Turbo and Code Llama Instruct. ^[1]^[2]

Selected figures from the Hugging Face model card illustrate the gap on compiler emulation and on the two FTD tasks:

Model	Size	Compiler emulation accuracy	Code-size improvement vs `-Oz`	Disassembly round-trip BLEU
GPT-4 Turbo	N/A	N/A	-0.01%	0.43
Code Llama Instruct	7B	1.2%	-0.49%	0.48
Code Llama Instruct	13B	0.8%	-0.42%	0.62
LLM Compiler	7B	16%	N/A	N/A
LLM Compiler	13B	20%	N/A	N/A
LLM Compiler FTD	7B	N/A	4.77%	0.95
LLM Compiler FTD	13B	N/A	4.88%	0.96

In the table a negative code-size improvement means the model's chosen flags produced a larger binary than the default -Oz baseline. ^[2]

Release and license

The models were released on 27 June 2024 under a bespoke license titled the "Meta Large Language Model Compiler License Agreement." Meta described it as a permissive community license intended to allow wide reuse, free for both research and commercial use, granting a non-exclusive, worldwide, non-transferable, royalty-free limited license to use, reproduce, and modify the models. As with Meta's Llama family, the agreement attaches an acceptable-use policy and requires a separate license from Meta for entities whose products exceed 700 million monthly active users; it also asks downstream users to attribute the work (for example, "Built with LLM Compiler") and to prefix the names of derived models with "LLM Compiler." ^[2]^[3]

The authors framed the release as a foundation for further research, intended to lower the barrier for academic and industry practitioners working on machine-learning-guided compiler optimization. The work builds on Meta's earlier research in this area, including the CompilerGym reinforcement-learning environment and graph-based program representations cited in the paper. ^[1]^[3]

References

Cummins, Chris; Seeker, Volker; Grubisic, Dejan; Roziere, Baptiste; Gehring, Jonas; Synnaeve, Gabriel; Leather, Hugh. "Meta Large Language Model Compiler: Foundation Models of Compiler Optimization." arXiv:2407.02524, 27 June 2024. https://arxiv.org/abs/2407.02524 ↩
"facebook/llm-compiler-13b." Hugging Face model card. https://huggingface.co/facebook/llm-compiler-13b ↩
"Meta Large Language Model Compiler: Foundation Models of Compiler Optimization." AI at Meta, Research publications. https://ai.meta.com/research/publications/meta-large-language-model-compiler-foundation-models-of-compiler-optimization/ ↩
"facebook/llm-compiler-7b." Hugging Face model card. https://huggingface.co/facebook/llm-compiler-7b ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Code Llama

Models and sizes

Training data and pipeline

Tasks

Reported results

Release and license

See also

References

Improve this article

Related Articles

Code Llama

LLaMA

LLaMA/Model Card

Llama 3

Llama 2

Llama 4

What links here

Related Articles

Code Llama

LLaMA

LLaMA/Model Card

Llama 3

Llama 2

Llama 4