Epoch AI

AI Companies Model Evaluation

13 min read

Updated Jun 25, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 25, 2026

Fact-checked

In review queue

Sources

26 citations

Revision

v3 · 2,501 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Epoch AI is a nonprofit research organization, founded in 2022 and directed by Jaime Sevilla, that studies the trajectory of artificial intelligence through quantitative analysis of compute, data, algorithms, and economics. It describes itself as "a data-first research nonprofit investigating the future of artificial intelligence," and is widely cited as a neutral authority on how fast AI capabilities are advancing and what is driving that progress. ^[1]^[3] Its database of notable machine learning models is the canonical public source for figures such as the "training compute doubling time," and its charts of long-run compute trends are reproduced throughout the field. Epoch AI also created the FrontierMath benchmark of research-level mathematics problems, runs an AI Benchmarking Hub, and publishes the Epoch Capabilities Index (ECI), a composite measure of frontier model ability. ^[1]^[2]^[3]

This article concerns the research organization Epoch AI and should not be confused with the epoch concept in machine learning, which refers to a single pass through a training dataset.

What is Epoch AI?

Epoch AI is a data-first research nonprofit whose stated goal is to help people understand what is happening in AI "from a neutral perspective and grounded in the best possible evidence." ^[3] The organization organizes its work around three questions: the drivers of AI progress (training compute, data, hardware, and scaling feasibility), the measurement of capability progress (benchmarks and indices), and the downstream impacts of AI (economics, adoption, and automation). ^[1]^[3]

Much of Epoch AI's influence comes from making its underlying datasets and methods public. It maintains open data explorers covering notable AI models, machine learning hardware, GPU clusters and data centers, and leading AI companies, alongside reports, a newsletter, and benchmark results. Its data and analysis have been used by Stanford's AI Index, and the organization has had past and ongoing partnerships with bodies including the UK government's science and AI departments. ^[3]^[9]

When was Epoch AI founded?

Epoch grew out of a volunteer effort begun around 2021, when Jaime Sevilla put out a call for collaborators to study historical trends in machine learning. The group's early analysis of training compute drew a strongly positive response, and Sevilla sought philanthropic funding to formalize the project. Epoch was founded in 2022, with Rethink Priorities acting as its initial fiscal sponsor. The founding team included Sevilla, Tamay Besiroglu, Lennart Heim, Pablo Villalobos, Eduardo Infante-Roldan, Marius Hobbhahn, and Anson Ho. The work was carried out in close collaboration with organizations such as Rethink Priorities and the grantmaker Open Philanthropy, which became a major funder. ^[4]^[5]^[6]

Over the following years Epoch grew into an independent organization, rebranding as Epoch AI and spinning out from its fiscal sponsor to operate as an independent 501(c)(3) nonprofit in early 2025. According to its 2025 impact report, the organization had 21 full-time staff, published more than 100 outputs, raised about $10.3 million in 2025 (a roughly 40 percent increase over the prior year), and spent about $5 million. Its board of directors included Tom Davidson, Ajeya Cotra, Jaime Sevilla, and Maria de la Lama. ^[7]

What does Epoch AI research?

Epoch AI's best-known work is its empirical study of AI compute and scaling laws. The organization built a database of notable machine learning models, where a model is counted as "notable" if it set a state-of-the-art result on a recognized benchmark, was highly cited (on the order of 1,000 or more citations), had clear historical relevance, or saw significant real-world use. The expanded database includes 333 compute estimates for notable models released between 2010 and May 2024, up from 98 models in Epoch's original 2022 analysis, and it has continued to grow since. ^[2]^[8]^[10]

The dataset was originally assembled for the 2022 report "Compute Trends Across Three Eras of Machine Learning," which identified distinct historical regimes in how the compute used to train AI systems has grown. Epoch's later analysis found that the training compute of notable models grew about 4.1x per year (90 percent confidence interval: 3.7x to 4.6x) between 2010 and May 2024, while for frontier models, defined as the running top 10 by compute, the rate was about 5.3x per year (90 percent confidence interval: 4.9x to 5.7x). Epoch summarizes the headline trend as training compute growing "by 4-5x per year." These figures, and the accompanying charts, became the standard reference points for discussions of AI scaling. ^[2]^[8]^[11]

Beyond raw compute, Epoch AI has produced a series of influential reports on the inputs to AI progress, summarized below.

Topic	Representative work	Key finding
Training compute trends	"Compute Trends Across Three Eras of Machine Learning" (2022)	Notable-model training compute grew about 4.1x per year, and frontier compute about 5.3x per year, from 2010 to May 2024. ^[2]^[8]^[11]
Data limits	"Will We Run Out of Data?" (first released 2022, updated 2024)	The stock of public, human-generated text could be fully used for training at some point between roughly 2026 and 2032. ^[12]^[13]
Algorithmic progress	Studies of algorithmic efficiency in language models and image recognition	A large share of measured progress comes from improvements in algorithms and data efficiency, not compute alone. ^[14]
Hardware and clusters	Trends in machine learning hardware and AI supercomputers	Tracks GPU performance, cluster sizes, and the growth of large training systems. ^[9]
Economics of scaling	GATE integrated assessment model (2025)	Models how compute investment and automation could drive rapid, even explosive, economic growth. ^[15]

The 2024 update to "Will We Run Out of Data?" attracted broad attention for its projection that frontier models could exhaust the supply of high-quality, human-generated public text data around the middle of the decade, a finding widely covered in the press and tied to the rising interest in synthetic data. The report's authors stressed that the projection carried high uncertainty and that data efficiency gains and synthetic data could push the limit back. ^[12]^[13]

In 2025 Epoch AI released GATE (Growth and AI Transition Endogenous model), an integrated assessment model developed by Besiroglu, Heim, and Sevilla that links a compute-based model of AI development, an automation framework, and a semi-endogenous economic growth model. GATE is used to explore scenarios in which heavy reinvestment of output into AI hardware and research could produce very rapid growth, while quantifying the uncertainty around how much of the economy can be automated. ^[15]

What is FrontierMath?

FrontierMath is a mathematics benchmark that Epoch AI announced on November 8, 2024. Its original release consisted of 300 problems (tiers 1 to 3), and the project later expanded with additional research-level material; the problems are original and unpublished, and intended to be far harder than earlier math benchmarks such as GSM8K and MATH, on which leading models had already reached near-perfect scores. The problems were created in collaboration with more than 60 mathematicians from universities across more than a dozen countries, and were designed to have answers that are automatically verifiable while still requiring deep, expert reasoning to produce. ^[16]^[17]^[18]

Problems span a wide range of difficulty. In the benchmark's main set, tiers run from advanced undergraduate material up through research-level mathematics, with the hardest tier (Tier 4) consisting of research-level problems; Epoch has separately maintained a collection of genuinely open research problems. At launch, frontier models performed extremely poorly: the FrontierMath paper reported that state-of-the-art systems, including OpenAI's o1-preview, GPT-4o, Anthropic's Claude 3.5 Sonnet, Google's Gemini 1.5 Pro, and xAI's Grok 2, each solved under 2 percent of the problems. The benchmark drew commentary from leading mathematicians, including Fields Medalists Terence Tao, Timothy Gowers, and Richard Borcherds, several of whom remarked on the difficulty of the questions. ^[16]^[17]^[18]

What was the FrontierMath OpenAI funding controversy?

FrontierMath became the subject of controversy over how its funding was disclosed. On December 20, 2024, around the time OpenAI announced its o3 model and cited a strong FrontierMath score, Epoch AI revealed that OpenAI had funded the creation of the benchmark. Critics objected that this relationship had not been disclosed earlier, especially given that Epoch was otherwise known as an independent, largely Open Philanthropy funded organization. ^[19]^[20]

The dispute intensified after a contractor who had worked on the benchmark, posting on the forum LessWrong under the name "Meemi," wrote that many contributors had not been told of OpenAI's involvement, stating that "the communication about this has been non-transparent" and that "Epoch AI should have disclosed OpenAI funding, and contractors should have transparent information about the potential of their work being used for capabilities." A Stanford mathematics PhD student, Carina Hong, separately reported that several contributing mathematicians said they had been unaware that OpenAI would have access to the problems and would not have contributed had they known. ^[19]^[20]

Epoch AI's associate director Tamay Besiroglu acknowledged that the organization had made a mistake on transparency. He wrote: "We were restricted from disclosing the partnership until around the time o3 launched, and in hindsight we should have negotiated harder for the ability to be transparent to the benchmark contributors as soon as possible." He stated that OpenAI had access to the FrontierMath problems but had a verbal agreement not to train on them, and that Epoch maintained a separate, unseen holdout set to allow independent verification of results. Epoch's lead mathematician, Elliot Glazer, said his personal view was that OpenAI's reported score was legitimate, that is, that the company had not trained on the dataset, but that Epoch could not fully vouch for the figure until its own independent evaluation was complete. Epoch AI's 2025 impact report later described a research-level Tier 4 set of 50 problems as having been commissioned by OpenAI. ^[7]^[19]^[20]

How does Epoch AI rank AI models? (ECI and the Benchmarking Hub)

Early in 2025 Epoch AI relaunched an AI Benchmarking Hub, which collects evaluation results reported by model developers and third parties alongside benchmarks that Epoch runs itself. The Hub became one of the organization's most visited pages, and it underpins Epoch's flagship capability metric. ^[3]^[7]^[21]

That metric is the Epoch Capabilities Index (ECI), a composite score that combines results from dozens of distinct benchmarks into a single "general capability" scale, allowing models to be compared even across periods long enough for any one benchmark to saturate. Epoch likens the ECI to an IQ-style measure: rather than tracking performance on a single skill, it aims to capture a broad underlying capability. The index is built on item response theory, the statistical framework used in standardized testing, and functions as a relative measure similar to an Elo rating, jointly estimating both how capable each model is and how difficult each benchmark is from the pattern of results when models are tested on overlapping benchmarks. By late 2025 the ECI drew on more than a thousand evaluations covering on the order of 147 models and roughly 39 underlying benchmarks. Epoch describes the ECI as an independent product over which it has full rights, while noting that the work built on methodology from a Google DeepMind paper, "A Rosetta Stone for AI Benchmarks." ^[21]^[22]^[23]

Who runs Epoch AI, and how is it regarded?

Jaime Sevilla, a Spanish researcher with a background in mathematics and computer science, founded Epoch and serves as its director. He has become a prominent voice on AI forecasting and the trajectory of transformative AI, and was profiled by TIME in 2024 in connection with Epoch's trend analysis. Tamay Besiroglu was a co-founder and the organization's associate director, leading much of its work on compute and economics. ^[4]^[24]

In April 2025, Besiroglu left Epoch AI to co-found Mechanize, a startup whose stated aim is "the full automation of the economy," beginning with white-collar work. Two other Epoch-affiliated researchers, Ege Erdil and Matthew Barnett, joined him. The move generated controversy and public criticism, because Mechanize's goal of automating human labor was seen as standing in tension with the safety-oriented and cautious framing associated with Epoch and parts of the AI safety community; Mechanize reported raising a seed round of about $7.3 million. Lennart Heim, another co-founder, later led the compute team at the RAND Corporation's center on AI and emerging technology. ^[25]^[26]

Epoch AI is generally regarded across the field as an authoritative and relatively neutral source on quantitative AI trends, and its datasets are widely reused by researchers, journalists, and policymakers. Its analyses are frequently cited in coverage of AI scaling, compute, and the data supply. At the same time, the FrontierMath funding episode prompted broader debate about the independence of benchmark organizations and the importance of disclosing industry funding and data access. Epoch's roots in the effective altruism and AI-safety funding ecosystem, and the later departure of several staff to build automation technology, have also featured in commentary about the organization's positioning. ^[1]^[19]^[25]

References

"Epoch AI." Epoch AI. Accessed June 9, 2026. https://epoch.ai/ ↩
"Compute Trends Across Three Eras of Machine Learning." Epoch AI. https://epoch.ai/blog/compute-trends ↩
"About." Epoch AI. Accessed June 9, 2026. https://epoch.ai/about ↩
Naughton, John, and TIME staff. "The Epoch AI Researcher Trying to Glimpse the Future of AI." TIME, 2024. https://time.com/6985850/jaime-sevilla-epoch-ai/ ↩
"Announcing Epoch: A research organization investigating the road to Transformative AI." EA Forum, 2022. https://forum.effectivealtruism.org/posts/zqRDNChFburJMmpqK/announcing-epoch-a-research-organization-investigating-the ↩
"Announcing Epoch AI: A research initiative investigating the road to transformative AI." Epoch AI. https://epoch.ai/latest/announcing-epoch ↩
"Epoch AI 2025 impact report." Epoch AI. https://epoch.ai/blog/epoch-impact-report-2025 ↩
"Training compute of frontier AI models grows by 4-5x per year." Epoch AI. https://epoch.ai/blog/training-compute-of-frontier-ai-models-grows-by-4-5x-per-year ↩
"Trends in AI supercomputers." Epoch AI. https://epoch.ai/blog/trends-in-ai-supercomputers ↩
"Data on Notable AI Models." Epoch AI. https://epoch.ai/data/notable-ai-models ↩
Sevilla, Jaime, et al. "Compute Trends Across Three Eras of Machine Learning." arXiv:2202.05924, 2022. https://arxiv.org/abs/2202.05924 ↩
"Will we run out of data? Limits of LLM scaling based on human-generated data." Epoch AI. https://epoch.ai/publications/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data ↩
Villalobos, Pablo, et al. "Will we run out of data? Limits of LLM scaling based on human-generated data." arXiv:2211.04325. https://arxiv.org/abs/2211.04325 ↩
"The least understood driver of AI progress." Epoch AI / Anson Ho. https://epoch.ai/gradient-updates/the-least-understood-driver-of-ai-progress ↩
"GATE: Modeling the trajectory of AI and automation." Epoch AI. https://epoch.ai/blog/announcing-gate ↩
"FrontierMath: Evaluating advanced mathematical reasoning in AI." Epoch AI. https://epoch.ai/frontiermath ↩
"FrontierMath." Epoch AI (benchmark overview). https://epoch.ai/frontiermath/tiers-1-4/about ↩
Glazer, Elliot, et al. "FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI." arXiv:2411.04872, November 2024. https://arxiv.org/abs/2411.04872 ↩
Wiggers, Kyle. "AI benchmarking organization criticized for waiting to disclose funding from OpenAI." TechCrunch, January 19, 2025. https://techcrunch.com/2025/01/19/ai-benchmarking-organization-criticized-for-waiting-to-disclose-funding-from-openai/ ↩
"OpenAI's FrontierMath Fiasco." Fortune, January 21, 2025. https://fortune.com/2025/01/21/eye-on-ai-openai-o3-math-benchmark-frontiermath-epoch-altman-trump-biden/ ↩
"Epoch Capabilities Index." Epoch AI. https://epoch.ai/benchmarks/eci ↩
"Epoch's Capabilities Index stitches together benchmarks across a wide range of difficulties." Epoch AI. https://epoch.ai/data-insights/interpreting-eci ↩
"ECI Documentation - Overview." Epoch AI. https://epoch.ai/data/eci-documentation ↩
"Jaime Sevilla." Epoch AI team page. https://epoch.ai/about/team/jaime-sevilla ↩
"Epoch AI alumni launch Mechanize to 'automate the whole economy.'" EA Forum, April 2025. https://forum.effectivealtruism.org/posts/HqKnreqC3EFF9YcEs/epoch-ai-alumni-launch-mechanize-to-automate-the-whole ↩
"A tech founder is getting skewered online after announcing his startup aims to replace all human workers with AI." Fortune, April 2025. https://fortune.com/article/tech-founder-online-epoch-ai-mechanize-tamay-besiroglu-automated-employees-workforce/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

AI energy consumption Cost Dyna Robotics Epoch Frontier models LLM Size and Parameter Comparison Marius Hobbhahn Model Evaluation Model training State of AI Report WebDev Arena WeirdML

What is Epoch AI?

When was Epoch AI founded?

What does Epoch AI research?

What is FrontierMath?

What was the FrontierMath OpenAI funding controversy?

How does Epoch AI rank AI models? (ECI and the Benchmarking Hub)

Who runs Epoch AI, and how is it regarded?

See also

References

Improve this article

Related Articles

Helicone

Patronus AI

Langfuse

LangSmith

Arize Phoenix

LMArena

What links here

Related Articles

Helicone

Patronus AI

Langfuse

LangSmith

Arize Phoenix

LMArena

What links here