Lean (Theorem Prover)
Last reviewed
May 18, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,533 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 18, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,533 words
Add missing citations, update stale details, or suggest a clearer explanation.
Lean is an open source interactive theorem prover and dependently typed functional programming language created by Leonardo de Moura. First launched at Microsoft Research in 2013, Lean is designed to express mathematical statements and proofs in a regimented formal language that the system can verify line by line. Its current major version, Lean 4, was first released as a stable system on 8 September 2023, and is the implementation used by essentially all contemporary AI mathematics research that targets formal proof.[1][2]
Lean has become the dominant proof assistant in machine learning research on mathematical reasoning. Systems such as Google DeepMind's alphaproof, DeepSeek's deepseek-Prover series, Princeton's goedel prover, and Numina and Kimi's numinamath corpus all output proofs in Lean 4 against the community library mathlib. Standard benchmarks for neural theorem proving (miniF2F, PutnamBench, ProofNet, LeanDojo Benchmark) are published primarily in Lean. As a result, Lean now occupies a position in formal mathematics analogous to the one that PyTorch occupies in deep learning: it is the shared interface through which AI math systems are trained, evaluated, and compared.[3][4][5]
This article focuses on Lean as it is used by AI research. It covers Lean's history, the design of the language, the Mathlib library, the major AI systems and benchmarks built on top of Lean, the Lean Focused Research Organization, and the broader community ecosystem.
Lean was launched by Leonardo de Moura at Microsoft Research Redmond in 2013. De Moura, a Brazilian computer scientist, had previously worked on automated reasoning at SRI International and was the principal architect of the Z3 SMT solver, one of the most widely deployed automated theorem provers in industry. Lean was conceived as a vehicle for combining the interactive proof style of established proof assistants with the speed and automation of an SMT-style backend.[1][6]
The earliest releases, retrospectively labeled Lean 1 and Lean 2, were experimental. They included support for homotopy type theory based foundations, which were later dropped in favor of a foundation built on the Calculus of Inductive Constructions. The first versions of Lean were not intended for use by working mathematicians; they were research vehicles for the type theoretic core and elaboration algorithms that de Moura and his collaborators were developing.[1]
Lean 3 was first released on 20 January 2017 and was the first version of Lean to gain significant adoption. It was implemented primarily in C++, with portions of the elaborator and tactic framework written in Lean itself. Lean 3 introduced the tactic mode and metaprogramming style for which Lean is now known, exposing the kernel's term language as an embedded domain specific language inside Lean's tactic framework.[1]
Lean 3 was the version that the Mathlib community formed around. The mathlib3 library, started in 2017, grew to cover large portions of undergraduate and early graduate mathematics. Lean 3 was also the version used for the Liquid Tensor Experiment, a project initiated in December 2020 when Fields Medalist Peter Scholze challenged the formal methods community to verify a key theorem from his work with Dustin Clausen on condensed mathematics. Johan Commelin, Adam Topaz, and a team of collaborators completed the formalization in mathlib on 14 July 2022, a result widely seen as demonstrating that modern research mathematics is within reach of formal verification.[7]
After Lean 3.4.2, Lean 3 reached end of life as an officially supported version. Community members continued releasing unofficial updates through version 3.51.1, but development focus shifted to Lean 4.[1]
Work on Lean 4 began in 2018, motivated by the realization that to scale formal mathematics and verification it would be necessary to rewrite the system from the ground up. Lean 4 was first usable around 2021 and reached its first official stable release as v4.0.0 on 8 September 2023. With Lean 4, roughly 90 percent of the implementation is itself written in Lean: the language is self hosted, and the elaborator, tactic framework, and macro system are all Lean 4 programs that compile through Lean's own backend to native code via C.[2][8]
Lean 4 is not backward compatible with Lean 3. The migration was painful: tactic implementations could not be transferred mechanically because metaprogramming in Lean 4 is fundamentally different from Lean 3, and the surface syntax was reworked in several ways. The community developed a porting tool called mathport, primarily written by Mario Carneiro, Daniel Selsam, and Gabriel Ebner, which translated mathlib3 source into a starting point for mathlib4. The full port of mathlib to Lean 4 was declared complete on 16 July 2023, just under two months before the first stable Lean 4 release. Since then, all new development happens in mathlib4 and Lean 3 is effectively archived.[1][9]
In 2025 the ACM SIGPLAN Programming Languages Software Award was given to Gabriel Ebner, Soonho Kong, Leonardo de Moura, and Sebastian Ullrich for Lean, with the citation explicitly mentioning Lean's impact on mathematics, hardware and software verification, and AI.[1]
Lean is simultaneously a logic, a proof assistant, and a general purpose dependently typed functional programming language. Its foundation is a variant of the Calculus of Inductive Constructions, with a countable hierarchy of non cumulative universes, dependent function types, and inductive types. This is the same family of foundations used by other dependent type theory based proof assistants, but Lean's elaborator, metaprogramming framework, and compiler are distinctive.[10]
Lean's type system allows types to depend on values. A function type α → β can be generalized to a dependent function type written (x : α) → β x, where the type of the result depends on the value of the argument. Because propositions in Lean are themselves types (an instance of the propositions as types correspondence), this means that a single language uniformly handles ordinary programming, mathematical statements, and proofs. A proof of p is a term of type p; a verified function with input α and output satisfying property P is a term of type (x : α) → P x. Verification of programs and proofs of theorems are the same activity under this view.[10]
Every concrete type in Lean other than the universes and dependent function types is an instance of an inductive type. This includes the natural numbers, lists, the equality type, the various logical connectives, and any user defined data type. The kernel's notion of computation is the reduction relation on these inductive types together with beta and delta reduction.[10]
Proofs in Lean can be written in two styles that intermix freely. In term mode the user writes a proof term directly, treating the proof as an expression of the appropriate type. In tactic mode the user opens a by block and issues a sequence of tactic commands that incrementally build the proof term behind the scenes. Common tactics include intro to assume hypotheses, apply to apply a known lemma to the goal, exact to provide a complete proof of the current goal, rfl for reflexivity, simp for simplification using marked rewrite rules, rw for explicit rewriting, ring for proving identities in commutative rings, linarith for linear arithmetic, and omega for linear integer arithmetic. Mathlib also defines many domain specific tactics such as polyrith, positivity, and decide.[11]
A typical Lean 4 proof might look like:
theorem add_comm_proof (a b : Nat) : a + b = b + a := by
induction a with
| zero => simp
| succ n ih => simp [Nat.succ_add, ih]
This style of writing, in which the user can see a goal state evolve interactively as tactics are applied, is what most AI proof systems learn to imitate.
A distinguishing feature of Lean 4 is that its metaprogramming framework is itself written in Lean 4. The elaborator, tactic framework, and macro system are exposed as ordinary Lean libraries that user code can extend. Macros are declared with the macro attribute and act as Syntax → MacroM Syntax functions; macro_rules provides a syntax to syntax rewriting form; and elab_rules allows users to write fully custom elaborators that have access to the metavariable context. Tactics are registered with the @[tactic] attribute and are themselves Lean programs, which means a user or library author can introduce new tactics, custom domain specific languages, and reflective procedures without forking the compiler.[12]
This is important for AI applications: because tactics and even goal manipulation are Lean programs, tools like LeanDojo can drive Lean programmatically and observe the proof state at every step.[5]
The Lean 4 compiler emits C code that is then compiled by an ordinary C compiler. This makes Lean usable as a general purpose functional programming language: Lean 4 has been used to implement parts of the Lean elaborator itself, as well as the Lake build system, the doc-gen4 documentation generator, and a growing collection of libraries. Lean 4 supports reference counted memory management with destructive updates when reference counts are unique, an optimization called functional but in place that yields performance close to mutable code while preserving purely functional semantics.[2]
Lean 4 is the version of Lean that has been the default since the 4.0.0 release in September 2023. Its design priorities, as articulated by de Moura and Ullrich in published descriptions of the system, were to be an extensible system in which user code can implement the same kind of tools that the core team writes, to scale to large libraries like Mathlib, and to be fast enough that the editor experience remains responsive even on very long files.[2][8]
Lean 4 ships with the Lake build system, which manages packages, dependencies, and incremental compilation. A Lean 4 project is typically declared in a lakefile.lean file, and dependencies on external packages including mathlib are written in the same Lean syntax. The combination of Lake plus the lean-toolchain file pinning a specific Lean version means that AI systems training on Lean output can pin a known good toolchain and run identical builds, which is necessary for any reproducible benchmark.[13]
Editor integration is provided by the Lean Server, which speaks the Language Server Protocol and is supported in Visual Studio Code (the recommended editor for most users), Emacs, and Neovim. The Lean Server provides interactive goal display, error reporting, and tactic completion, and is also the mechanism by which programmatic interaction tools like LeanDojo drive Lean.[5]
mathlib (sometimes written Mathlib) is the community maintained mathematical library for Lean. It is by a substantial margin the largest and most general formalized mathematics library in existence and is the de facto standard against which AI proof systems are evaluated.[14][15]
Mathlib was started in 2017 alongside Lean 3 and grew to several hundred thousand lines of Lean 3 code by 2022. The port to Lean 4, called mathlib4, was completed in July 2023, and ongoing work happens entirely in mathlib4. Recent measurements put mathlib at roughly 1.9 million lines of Lean code, covering abstract algebra, real and complex analysis, measure theory, probability, linear algebra, topology, category theory, number theory, combinatorics, dynamics, computability, and parts of algebraic geometry.[14][15]
Mathlib is maintained by a large and decentralized community. The project lists more than twenty active maintainers with expertise across different mathematical areas and accepts contributions from hundreds of mathematicians and students worldwide. Most discussion happens on the Mathlib stream of the Lean Zulip chat, and pull requests are reviewed against community style guides for naming, structure, and proof simp normal forms.[14]
For AI systems, Mathlib serves multiple roles: it is the standard library that benchmarks like miniF2F and ProofNet depend on, it is a large source of training data for supervised fine tuning, and it is the substrate against which generated proofs are typechecked. A 2025 effort called LeanNavigator, for example, generated approximately 4.7 million theorems totaling around 1 billion tokens by exploring proof state transitions in mathlib4, illustrating the scale at which automated proof generation can be driven from the existing library.[16]
Mathlib also publishes the Mathematics in Lean tutorial by Jeremy Avigad and Patrick Massot, which is the canonical entry point for mathematicians learning to write Mathlib style proofs. The tutorial is structured around a sequence of small example proofs in algebra, analysis, topology, and other areas, and is the source of many of the natural language statements that are paired with formal Lean statements in AI training corpora.[17]
The reason Lean has become the focal point of AI research on mathematical reasoning is that Lean provides what informal natural language reasoning cannot: a verifier. When an AI system writes a Lean proof and Lean accepts it, the proof is correct in a sense that admits no ambiguity. This contrasts with chain of thought style reasoning, where a model can produce plausible looking but mistaken arguments that no automated checker can catch. As a result, Lean serves as a reinforcement learning environment with a reliable reward signal, an evaluation harness for benchmarking, and a source of high quality formal training data.[3][4]
The dominant pattern in current AI math research is to combine a large language model with the Lean kernel. The language model proposes either complete proofs or individual tactic steps, and Lean either accepts the result or returns an error. Failures can drive further proof search, supervised training on successful trajectories, or reinforcement learning. This loop has been instantiated in several major systems since 2023.
A second important use of Lean in AI is autoformalization: translating mathematical statements from natural language LaTeX into formal Lean statements. Autoformalization is necessary because the supply of human written formal mathematics is small compared to the supply of informal mathematics, and modern AI training pipelines need large corpora. AlphaProof, DeepSeek-Prover, and the NuminaMath-LEAN project all depend on autoformalization to generate training data at scale.[3][4][18]
alphaproof is Google DeepMind's formal theorem proving system, announced in July 2024. AlphaProof was the system primarily responsible for DeepMind's silver medal level performance at the 2024 International Mathematical Olympiad, where together with alphageometry 2 it solved four of the six problems for a score of 28 out of 42, the threshold for a silver medal. AlphaProof solved problems P1, P2, and P6, including the contest's hardest problem, which was solved by only five human contestants. AlphaGeometry 2 solved P4, a geometry problem.[3]
AlphaProof is built on Lean 4. The system pairs a Gemini based language model that is fine tuned to autoformalize natural language problems into Lean statements with an AlphaZero style reinforcement learning agent that searches for proofs. The reinforcement learning loop is grounded in Lean: candidate proof steps are accepted only when Lean's elaborator and kernel verify them, and successful proofs reinforce the policy. Training used millions of autoformalized problems spanning a wide range of difficulties and topics. For the IMO problems specifically, AlphaProof used test time reinforcement learning, generating large numbers of related problem variants at inference time and training on those.[3]
The Nature paper describing AlphaProof, published in 2025, lays out the architecture in detail and discusses the system's reliance on Mathlib as the background library against which formal statements and proofs are written.[3]
The DeepSeek-Prover series is an open source family of large language models for formal theorem proving in Lean 4, released by the DeepSeek group. The first version, described in a 2024 paper, used large scale synthetic data generation in Lean to bootstrap a model capable of competitive performance on miniF2F.[4]
DeepSeek-Prover-V1.5, published in August 2024, combined supervised fine tuning, proof assistant feedback for reinforcement learning, and Monte Carlo tree search over the proof state. DeepSeek-Prover-V2, released in April and May 2025 with both a smaller variant and a 671 billion parameter model, introduced a recursive proof search pipeline. In V2, the larger DeepSeek-V3 model produces a high level proof sketch as a sequence of Lean have statements ending in sorry placeholders, and a specialized subgoal prover then fills in each of the sketched lemmas. This mirrors the human style of incrementally reducing complex theorems to manageable sublemmas.[4]
DeepSeek-Prover-V2-671B achieved an 88.9 percent pass rate on miniF2F-test and solved 49 of 658 problems on PutnamBench. The release also introduced ProverBench, a benchmark of 325 formalized problems for evaluating mathematical reasoning.[4]
Because DeepSeek-Prover is open source and emits Lean 4 directly, it has become a common baseline for subsequent open source theorem proving research.
goedel prover is a series of open source theorem proving models developed at Princeton, with Goedel-Prover-V2 released in 2025. Goedel-Prover targets Lean 4 and uses expert iteration, scaffolded data synthesis, and reinforcement learning. The training pipeline introduces three innovations: scaffolded data synthesis, in which synthetic tasks of progressively increasing difficulty are generated; verifier guided self correction, in which the model revises its own proofs using feedback from the Lean compiler; and model averaging, in which checkpoints are merged to mitigate the loss of output diversity that often appears in late stage RL.[19]
At release, Goedel-Prover-V2 reported the strongest overall performance among open source theorem provers, achieving a 57.6 percent success rate on miniF2F at Pass@32 with models substantially smaller than competing systems, and solving 7 problems on PutnamBench at Pass@512. The release included formal proofs for 29.7 thousand problems from the Lean Workbook, nearly doubling the 15.7 thousand previously solved by other open source provers.[19]
miniF2F (the name stands for "minimal Formal to Formal") is the standard benchmark for AI theorem proving on competition mathematics. It was originally created in the OpenAI research project on Lean theorem proving and is hosted at github.com/openai/miniF2F.[20]
miniF2F contains 488 formal mathematical problems split evenly into 244 validation and 244 test problems. Sources include 80 problems from olympiad competitions (20 IMO, 15 AIME, and 45 AMC), 130 problems formalized from the MATH dataset across algebra and number theory at difficulty levels 1 through 5, and 34 custom problems covering algebra, number theory, and induction. The benchmark is cross system: the same 488 problems are formalized in Lean, Metamath, Isabelle, and HOL Light, allowing direct comparison across theorem provers. Lean and Metamath have complete coverage in their respective versions, Isabelle is full, and HOL Light has partial coverage of 165 problems.[20]
miniF2F is the standard headline number for AI theorem proving systems. Successful work since 2022 has driven Lean pass rates from single digits to the 80 to 90 percent range, with DeepSeek-Prover-V2 reporting 88.9 percent on miniF2F-test. Because the benchmark is widely reported, it has been criticized as approaching saturation, and analyses such as "miniF2F-Lean Revisited" (2025) have argued that newer benchmarks like PutnamBench better discriminate between systems at the frontier.[4][21]
Beyond miniF2F, several other Lean focused benchmarks have become standard.
ProofNet, introduced in 2023, is a benchmark for autoformalization and formal proof of undergraduate mathematics. It contains 371 examples, each consisting of a natural language theorem statement, a natural language proof, and a formal Lean statement. The problems are drawn from popular undergraduate textbooks in real and complex analysis, linear algebra, abstract algebra, and topology, and depend on Mathlib. The original version was Lean 3, but Lean 4 ports are now standard, including the version distributed with DeepSeek-Prover-V1.5.[22]
LeanDojo, published as an oral at NeurIPS 2023 in the Datasets and Benchmarks track, is a toolkit, benchmark, and dataset. LeanDojo provides a Python interface that drives a real Lean process and exposes the proof state at every step, which makes it suitable for both reinforcement learning environments and supervised data extraction. The LeanDojo Benchmark contains 98,734 theorems and 217,776 tactics extracted from mathlib3; the Lean 4 version, LeanDojo Benchmark 4, contains 122,517 theorems and 259,580 tactics from mathlib4. A key design choice is the data split: test theorems depend on premises never seen during training, requiring genuine generalization. The accompanying ReProver model was the first retrieval augmented LLM theorem prover and was trained with one GPU week.[5]
PutnamBench, published at NeurIPS 2024, is a benchmark of 1,724 formalizations of problems from the William Lowell Putnam Mathematical Competition. It is multilingual: 672 problems are formalized in Lean 4, 640 in Isabelle, and 412 in Coq. PutnamBench is substantially harder than miniF2F. At its introduction, state of the art systems could solve only a handful of its problems, making it the benchmark of choice for evaluating systems that approach saturation on miniF2F.[23]
NuminaMath-LEAN, released in 2025 by Numina and the Kimi team at deepseek's pace of open source AI for math work, is a large scale dataset of approximately 100,000 mathematical competition problems formalized in Lean 4, derived from a curated subset of the NuminaMath 1.5 dataset. All formalizations are verified against mathlib v4.15.0. The dataset focuses on prestigious competitions including the IMO and USAMO, and provides hand crafted Lean proofs together with metadata and atomized lemmas suitable for fine tuning and benchmarking.[18]
Together these benchmarks define a research stack: ProofNet tests undergraduate level autoformalization and proving, miniF2F is the headline competition benchmark, PutnamBench is the harder frontier benchmark, LeanDojo provides the programmatic interface, and NuminaMath-LEAN provides large scale training data.
The Lean Focused Research Organization (Lean FRO) is the non profit organization established to advance Lean. It was founded in July 2023 as a non profit under Convergent Research, the same organizational model used for the Astera Institute and several other Focused Research Organizations on long horizon scientific infrastructure problems.[24]
The Lean FRO operates on a five year mission to make Lean a sustainable open source platform, with priorities including scalability, usability, documentation, and proof automation. Leonardo de Moura is Chief Architect and co founder. He is also a senior principal applied scientist in the Automated Reasoning Group at Amazon Web Services, which he joined in 2023 after seventeen years at Microsoft Research. Sebastian Ullrich is Head of Engineering and co founder. The organization employs a global team of researchers and engineers.[24]
Funding for the Lean FRO comes from Convergent Research, the Alfred P. Sloan Foundation, the Richard Merkin Foundation, the Simons Foundation International, and Alex Gerko, the founder of the trading firm XTX Markets. In July 2025 Gerko announced an additional 10 million dollar contribution, of which 5 million was directed to a Mathlib Initiative supporting the math library directly. This makes the Lean FRO one of the most well resourced open source proof assistant projects in history.[24]
The Lean community is organized around the leanprover.zulipchat.com Zulip server, which hosts streams for the core language, mathlib, Lean for the Curious Mathematician events, machine learning for theorem proving, and many smaller topics. The Lean website is at lean-lang.org and the FRO site is at lean-fro.org.[25]
The official editor is Visual Studio Code with the lean4 extension, although community supported integrations exist for Emacs and Neovim. The Lake build system handles dependencies, and the Reservoir registry at reservoir.lean-lang.org lists Lean packages. Notable packages beyond Mathlib include the std library, the Lean Metaprogramming Book, doc-gen4 for documentation generation, and a growing collection of domain libraries.[13]
In education, the Xena Project at Imperial College London, run by Kevin Buzzard, has been an important driver of formalization at the undergraduate level. Buzzard, a number theorist working in the Langlands program, started Xena in 2017 to teach Lean to undergraduate students and to formalize the Imperial College pure mathematics curriculum. Although the project began in Lean 3, its work and contributors largely flowed into Mathlib, and Buzzard has been one of the most visible advocates of the position that working mathematicians should engage with formal proof tools.[26]
In industry, applications of Lean include hardware verification (notably at Amazon Web Services, where de Moura's group uses Lean for verification of cloud algorithms), software verification through tools like Aeneas that translate Rust into Lean, and protocol verification. The 2025 ACM SIGPLAN award citation noted explicitly that Lean has had significant impact on hardware and software verification as well as on mathematics and AI.[1]
Lean is one of several dependent type theory based proof assistants, alongside Coq (now called Rocq), Agda, and Isabelle/HOL. Coq, developed at Inria since 1989, is the most established system in the family and was the proof assistant used for major formalizations including the four color theorem and the Feit-Thompson odd order theorem. Agda, developed at Chalmers, emphasizes dependent pattern matching and is closer to a programming language in style. Isabelle/HOL, while not based on dependent type theory, is the long standing flagship higher order logic system.
Lean differs from these systems in several respects that have become important for AI research: its elaborator and tactic framework are written in Lean itself, making the system more extensible than its predecessors; its development since 2023 is concentrated at a well funded non profit with a clear long term roadmap; and its mathematical library, Mathlib, is uniquely large, actively developed, and explicitly designed for use as a unified background theory rather than a collection of independent contributions. These factors, combined with the early adoption of Lean by DeepMind's AlphaProof team, have led to the current concentration of AI theorem proving work in Lean specifically.