AI gold medals at the 2025 IMO

AI Companies AI Safety

9 min read

Updated Jun 9, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 9, 2026

Fact-checked

In review queue

Sources

7 citations

Revision

v2 · 1,812 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Overview

In July 2025, two leading artificial intelligence laboratories, OpenAI and Google DeepMind, independently reported that their large language models had achieved gold-medal-level performance at the 2025 International Mathematical Olympiad (IMO), the world's foremost mathematics competition for pre-university students. Each system solved 5 of the 6 problems for a score of 35 out of 42 points, exactly the gold-medal threshold at the 2025 event. Both produced full proofs in natural language under competition-like time limits, without the formal proof assistants or geometry-specific engines that had powered earlier systems. ^[1]^[2]^[3]

The achievement was widely described as a landmark in machine reasoning, because it was reached by general-purpose reasoning models rather than the specialized theorem-proving systems that had reached silver-medal level a year earlier. It was also accompanied by a public dispute over how and when the results were announced, and over the differing degrees of formal verification behind the two claims. Google DeepMind's result was officially graded and certified by the IMO's own coordinators under pre-agreed conditions; OpenAI announced its result independently, with grading by former IMO medalists rather than by the official jury. ^[2]^[3]^[4]

Background: the IMO and the 2024 silver

The International Mathematical Olympiad, first held in 1959, is the premier secondary-school mathematics competition. Contestants face six problems over two exam days, with 4.5 hours per day and three problems per day. Each problem is scored from 0 to 7 points, for a maximum of 42. The 66th IMO was hosted in the Sunshine Coast region of Queensland, Australia, with the overall event running from 10 to 20 July 2025; the two competition exams took place in mid-July. The 2025 edition drew 630 contestants from 110 countries. ^[5]^[6]

The 2025 problems were unusually demanding at the top: the gold-medal cutoff was set at 35 points, reported to be the highest gold threshold in the competition's history. Of the 630 contestants, 72 received gold medals, 104 silver, and 145 bronze, and only five contestants achieved a perfect 42. Problem 6 was the hardest of the set and was left unsolved by a large majority of human competitors. ^[5]^[6]

Reaching gold-medal performance on full, previously unseen olympiad problems under competition-like conditions had been a long-standing goal in AI. Olympiad problems require multi-step creative reasoning and rigorous proof, not pattern recall, and they cannot be solved by retrieving memorized answers. In July 2024, Google DeepMind reported the first major milestone: a combined system of AlphaProof and AlphaGeometry 2 solved 4 of the 6 problems from the 2024 IMO for 28 points, matching the silver-medal threshold that year. That result relied on specialized methods: problems were first manually translated into the formal language Lean by human experts, AlphaProof searched for formal proofs using reinforcement learning, and the system took up to three days on some problems rather than the 4.5-hour human limit. ^[1]^[2]

The 2025 results

In July 2025 the picture changed in two important ways. First, the systems that reached gold were general-purpose reasoning models rather than dedicated provers. Second, they worked end to end in natural language, reading the official problem statements and writing human-readable proofs, without expert translation into a formal language. ^[2]^[3]

	Google DeepMind	OpenAI
System	Advanced version of Gemini with "Deep Think"	Experimental reasoning model (unreleased)
Problems solved	5 of 6 (Problem 6 unsolved)	5 of 6 (Problem 6 unsolved)
Score	35 / 42 (gold threshold)	35 / 42 (gold threshold)
Output	Natural-language proofs	Natural-language proofs
Time limit	Within the 4.5-hour-per-paper contest window	Same time limits as human contestants, no tools or internet
Grading	Officially graded and certified by IMO coordinators	Graded by former IMO medalists, not officially certified
Announced	21 July 2025	Around 19 July 2025

Google DeepMind

Google DeepMind used an advanced version of its Gemini model running in a "Deep Think" mode that explores multiple candidate solution paths in parallel before committing to an answer. The company attributed the improvement over 2024 to a combination of reinforcement learning on multi-step reasoning and proof data, a curated corpus of high-quality mathematical solutions, and general guidance on olympiad problem-solving approaches. Crucially, the model received the problems in ordinary language and produced complete proofs in ordinary language within the competition time limit, a sharp contrast with the Lean-based AlphaProof pipeline of 2024. ^[2]

DeepMind's result was scored by the IMO's official coordinators, the same graders who assess human contestants, against the competition's confidential marking scheme. IMO President Gregor Dolinar confirmed the outcome, and the graders described the model's solutions as clear and precise. The result, 35 of 42 points, met the gold-medal standard. DeepMind announced the achievement on 21 July 2025. ^[2]

OpenAI

OpenAI reported the same headline score, 35 of 42, also solving five problems and failing only Problem 6. The work was carried out by a small OpenAI research team; researcher Alexander Wei announced the result in a thread on X around 19 July 2025, crediting colleagues including Sheryl Hsu and Noam Brown. OpenAI emphasized that the system was an experimental, general-purpose reasoning model rather than a math-specific tool, and that the result was driven by scaling test-time compute and reinforcement learning on hard-to-verify tasks. The model worked under the same time limits as human contestants, without tools, internet access, or external resources, and wrote proofs in natural language. OpenAI stated that the model was a research prototype and that it did not plan to release anything at that level of mathematical capability for several months. ^[3]^[4]

The principal difference was in verification: OpenAI did not participate in the IMO's official coordination process, and its solutions were graded by three former IMO medalists rather than by the competition's official jury using the confidential marking scheme. ^[3]^[4]

The announcement controversy

The two announcements, made within days of each other, triggered a public dispute on two fronts: the timing of OpenAI's announcement, and the legitimacy of an uncertified gold claim. Both sides should be weighed neutrally, as accounts of what was agreed differ. ^[3]^[4]

On timing, the IMO had requested that AI laboratories refrain from publicizing their results until after the competition's closing ceremony, so that the human medalists would receive their recognition first; some accounts describe a requested waiting period extending roughly a week beyond the ceremony. Google DeepMind said it had coordinated formally with the IMO and held its announcement until after the ceremony out of respect for that request, and DeepMind chief executive Demis Hassabis publicly criticized announcing before the agreed window. OpenAI's Noam Brown responded that OpenAI had not been party to any formal coordination agreement, that it had been asked only verbally by an IMO board member to wait until after the closing ceremony, and that it had honored that specific request. The factual core that is not disputed is that OpenAI published first and Google DeepMind published a few days later. ^[3]^[4]

On legitimacy, DeepMind's head of reasoning, Thang Luong, argued that an official medal claim requires evaluation against the IMO's confidential marking guidelines, and that without such evaluation no medal claim could be made; he noted that a single additional point deducted would change the outcome from gold to silver. OpenAI maintained that its grading by former IMO medalists was rigorous and that its score sat comfortably within the gold band. Separately, the IMO did not establish formal rules for AI participation in 2025, and President Gregor Dolinar noted that the IMO could not validate the methods used, including the amount of compute, whether any human was involved, or whether the results could be reproduced. ^[3]^[4]

Mathematician Terence Tao cautioned more broadly that, without testing conditions standardized and disclosed in advance, AI and human results were difficult to compare. He observed that choices such as allowing extra time, providing curated training data, running many attempts in parallel, or submitting only the best of several runs could substantially change reported success rates, so headline comparisons to human contestants should be read with care. ^[3]^[4]

Significance

The 2025 results were significant for several reasons. The systems that reached gold were general-purpose reasoning models, not bespoke theorem provers, and they produced natural-language proofs directly from natural-language problems, eliminating the human-assisted formalization step that AlphaProof required in 2024. This suggested that frontier reasoning models had developed enough reliable, long-horizon proof ability to handle genuine olympiad mathematics, a domain long considered a stress test for machine reasoning. The achievement fit a broader 2024 to 2025 trajectory in which reasoning models, trained with reinforcement learning and given more inference-time compute, made rapid gains on hard problems. For Google DeepMind specifically, the result also represented a jump within a single year from a silver-level specialized system to a gold-level general one. ^[2]^[3]

Caveats and aftermath

Several caveats temper the milestone. Both systems failed Problem 6, the hardest problem, so neither solved the full paper. The improvement in raw score over 2024 was modest, from 28 to 35 points, even though the qualitative shift from formal, specialized methods to general natural-language reasoning was large. The two claims also rested on different evidential footings: DeepMind's was officially certified, while OpenAI's was independently graded but not certified by the IMO. And because both models were proprietary, neither result could be independently reproduced by outside parties, and the IMO explicitly declined to validate the underlying methods or compute. ^[3]^[4]

The 2025 IMO was not the only such demonstration that year. Other groups, including ByteDance and the startup Harmonic, reported strong results using formal, Lean-based approaches in the days following the competition. In September 2025, the trajectory continued at the International Collegiate Programming Contest (ICPC) World Finals, where an advanced version of Gemini reached gold-medal-level performance, solving 10 of the contest's problems, and OpenAI reported a perfect score of 12 of 12 using a combination of GPT-5 and an experimental reasoning model. Together with the IMO results, these events marked 2025 as the year in which AI systems first matched or exceeded elite human performance at the most prestigious mathematics and programming olympiads. ^[2]^[7]

References

Google DeepMind, "AI achieves silver-medal standard solving International Mathematical Olympiad problems," deepmind.google, July 2024. ↩
Google DeepMind, "Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad," deepmind.google, 21 July 2025. https://deepmind.google/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/ ↩
VentureBeat, "Google DeepMind makes AI history with gold medal win at world's toughest math competition," venturebeat.com, July 2025. https://venturebeat.com/ai/google-deepmind-makes-ai-history-with-gold-medal-win-at-worlds-toughest-math-competition ↩
"Google and OpenAI Get 2025 IMO Gold," LessWrong, July 2025. https://www.lesswrong.com/posts/ZkgaPopsBkQgeA2k8/google-and-openai-get-2025-imo-gold ↩
International Mathematical Olympiad, "66th IMO 2025," imo-official.org, 2025. https://www.imo-official.org/year_info.aspx?year=2025 ↩
"2025 International Mathematical Olympiad results," math-soc.com, 22 July 2025. https://math-soc.com/2025/07/22/2025-international-mathematical-olympiad-results-are-in/ ↩
Google DeepMind, "Gemini achieves gold-medal level at the International Collegiate Programming Contest World Finals," deepmind.google, September 2025. https://deepmind.google/blog/gemini-achieves-gold-medal-level-at-the-international-collegiate-programming-contest-world-finals/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

AI at the 2025 ICPC World Finals

Overview

Background: the IMO and the 2024 silver

The 2025 results

Google DeepMind

OpenAI

The announcement controversy

Significance

Caveats and aftermath

See also

References

Improve this article

Related Articles

Anthropic

Patronus AI

Frontier Model Forum

Apollo Research

Safe Superintelligence Inc

Goodfire AI