AI at the 2025 ICPC World Finals
Last reviewed
Jun 8, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 1,883 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 8, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 1,883 words
Add missing citations, update stale details, or suggest a clearer explanation.
In September 2025, at the 49th International Collegiate Programming Contest (ICPC) World Finals in Baku, Azerbaijan, two leading artificial-intelligence laboratories reported that their general-purpose reasoning models reached or surpassed the level of the world's best university programming teams. OpenAI said an ensemble of its reasoning models solved all 12 problems on the contest set, a perfect score that no human team achieved and that would have ranked first in the standings. Google DeepMind reported that an advanced version of Gemini 2.5 Deep Think achieved gold-medal-level performance by solving 10 of 12 problems, a result that would have placed second among the competing teams, and that it solved one problem, Problem C, that no human team solved during the live contest. Both laboratories announced their results on September 17, 2025. [1][2][3]
The ICPC World Finals is widely regarded as the most prestigious team-based competitive programming contest at the university level, sometimes described as the "Olympics of programming." The 2025 results followed by about two months the AI gold medals at the 2025 IMO, in which both OpenAI and Google DeepMind reported gold-medal-level performance at the International Mathematical Olympiad. Together the two milestones were taken as evidence that frontier reasoning systems could solve novel, hard problems under contest-like conditions, not only on static benchmarks. [2][4]
The ICPC is an annual, multi-tier algorithmic programming competition for university students. Teams of three students share a single computer and have five hours to solve a set of roughly 10 to 12 algorithmic problems, submitting code that must pass hidden test cases to be accepted. The 2025 World Finals, the 49th edition, was held in Baku, Azerbaijan, during the week of August 31 to September 5, 2025, with the contest itself run on September 4. According to ICPC figures, the finalist field was drawn from tens of thousands of students at thousands of universities worldwide, and only the top teams earn gold medals. The 2025 world champion was St. Petersburg State University, whose team solved 11 of the 12 problems. [3][5]
The ICPC milestone is closely tied to the trajectory of "reasoning" models, systems trained to spend additional inference-time computation working through multi-step problems before answering. Earlier in 2025, at the International Mathematical Olympiad held in July, both OpenAI and Google DeepMind reported gold-medal-level results from general-purpose models, marking the first time AI systems were said to reach that bar on the IMO. The ICPC attempt extended the same approach from mathematical proof writing to live algorithmic programming, where solutions must not only be correct in principle but compile, run within strict time and memory limits, and pass automated judging. [2][4]
Both laboratories ran their models on the same 12-problem set that the human finalists faced, under the contest's five-hour time limit.
OpenAI reported a perfect score of 12 out of 12. The company said it used an ensemble of general-purpose reasoning models rather than a system trained specifically for the ICPC. According to OpenAI researcher Mostafa Rohaninejad, the setup combined GPT-5 with an internal experimental reasoning model: both generated candidate solutions, and the experimental model selected which solutions to submit. GPT-5 produced accepted solutions for 11 of the 12 problems on the first attempt. The remaining problem, the hardest on the set, was solved by the experimental reasoning model, which reached an accepted solution after nine submissions. OpenAI stated that the 12-for-12 result would have been enough for a first-place ranking; the best human team solved 11. [1][6][7]
Google DeepMind reported that an advanced version of Gemini 2.5 Deep Think solved 10 of the 12 problems and reached gold-medal-level performance. DeepMind said the model began 10 minutes after the human contestants and worked within the same five-hour window, solving eight problems in the first 45 minutes and two more over the following roughly three hours, for a combined "penalty" time of 677 minutes across its solved problems. By that measure, the company said, Gemini would have ranked second overall among the 139 competing teams, behind only the eventual champion. DeepMind highlighted that Gemini solved Problem C, an optimization task involving the distribution of liquid through a network of interconnected ducts, which no university team solved during the contest; the company described the approach as combining dynamic programming with a minimax argument. [2][8]
The table below summarizes the reported headline results.
| Entrant | Model(s) | Problems solved (of 12) | Equivalent ranking | Notable detail |
|---|---|---|---|---|
| OpenAI | GPT-5 plus an experimental reasoning model (ensemble) | 12 | Would rank 1st | 11 solved first-try by GPT-5; hardest solved after 9 submissions |
| Google DeepMind | Gemini 2.5 Deep Think (advanced version) | 10 | Would rank 2nd | Solved Problem C, which no human team solved; 677 minutes total |
| Best human team (St. Petersburg State University) | Human, world champion | 11 | 1st (actual) | 1,478 minutes total |
A point widely noted afterward was that no single entrant, human or AI, achieved a clean sweep on the day under live conditions: the champion human team solved 11, and the two AI systems each missed at least the problems the other or the humans could solve, except that OpenAI's full-ensemble result reached all 12. DeepMind observed that combining the AI and human solutions would have covered all 12 problems. [2][9]
The two efforts differed in their relationship to the official contest and in their reported tooling, which is important for interpreting the results.
DeepMind described its attempt as conducted with the involvement of the competition organizers. Gemini competed in a remote online environment following ICPC rules and timing, and DeepMind said the ICPC confirmed that the submitted solutions were complete and accepted. DeepMind also noted, however, that the ICPC's review covered only the submissions themselves and did "not extend to validating our system, processes, or underlying model," so the verification applies to the accepted code, not to how it was produced. [2]
OpenAI characterized its run as an internal evaluation rather than an official entry. The models received the problems in the standard PDF format used by the contest and submitted within the five-hour limit, with submissions judged by an official ICPC judge. OpenAI emphasized that it used general-purpose reasoning models with no ICPC-specific training and, according to contemporaneous accounts, no bespoke test-time harness; the models had access to a code-execution sandbox without internet access. Reporting at the time noted uncertainty about the exact tools available to Gemini during its run, while observing that DeepMind's July IMO result had been achieved without external tools. [1][6][9]
Both laboratories stressed that the systems were general-purpose: the same families of reasoning models used for other tasks, not narrow programming engines fine-tuned for ICPC-style problems. DeepMind further reported that internal studies of a similar Gemini configuration on the 2023 and 2024 ICPC finals showed gold-medal-level performance, comparable to roughly the top 20 human competitive coders. [1][2]
The ICPC results were read as a marker that frontier reasoning models could match or exceed elite human performance on novel algorithmic problems under realistic contest constraints. Unlike fixed coding benchmarks, where models can be susceptible to memorization or contamination from training data, the ICPC World Finals presents previously unseen problems written for that year's contest, solved against hidden tests within a hard time limit. Solving Problem C, which the entire human finalist field failed to crack, was singled out as evidence that the systems were doing genuine algorithmic reasoning rather than pattern matching on familiar tasks. [2][8]
The milestone fit a broader 2025 trajectory in which reasoning models posted rapid gains on hard, structured problem solving. Within roughly two months, OpenAI and Google DeepMind reported moving from gold-medal-level mathematics at the IMO to top-of-field competitive programming at the ICPC. Observers connected this to progress on software-engineering benchmarks such as SWE-bench, which measure whether models can resolve real GitHub issues, arguing that competition results and engineering benchmarks together pointed to increasingly capable coding agents. Both companies framed the achievement in terms of practical implications for software development, scientific computing, and other domains that depend on translating complex specifications into correct, efficient code. [1][2][4]
At the same time, the ICPC tasks differ from everyday software work: they reward tightly scoped, self-contained algorithmic puzzles solvable in hours, rather than the long-horizon, ambiguous, large-codebase tasks that dominate professional engineering. Several commentators cautioned against reading the result as evidence that AI had broadly surpassed human programmers, as opposed to excelling at a specific, well-defined competitive format. [9]
Several caveats temper the headline numbers:
Self-reported and partially verified. The results were announced by the laboratories themselves. DeepMind's submissions were accepted by an official ICPC judge and the ICPC confirmed the solutions were complete, but the ICPC explicitly did not validate the underlying system or process. OpenAI's run was an internal evaluation rather than an official contest entry, and its claims rest primarily on the company's own statements and those of its researchers. [1][2][9]
Not a like-for-like contest with humans. The AI systems did not compete inside the official standings. Comparisons such as "would have ranked first" or "would have ranked second" are reconstructions based on problems solved and penalty time, not official placements. The human teams worked under additional constraints, including sharing a single machine among three people. [2][9]
Tooling and compute. OpenAI's models used a code-execution sandbox, and the precise tools and compute budgets used by each system were not fully disclosed. Reasoning models of this generation can consume substantial inference-time computation, and the cost or amount of compute behind the runs was not detailed, making efficiency comparisons with human contestants difficult. [9]
Format-specific. ICPC problems are short, self-contained, and automatically judged. Strong performance here does not directly establish capability on the long-horizon, loosely specified tasks typical of real-world software engineering. [9]
With those qualifications, the 2025 ICPC results were nonetheless treated as a significant data point: for the first time, AI systems were reported to reach the top tier of the premier collegiate programming contest, on novel problems, under contest-like timing, using general-purpose reasoning models. [1][2]