AlphaDev
Last reviewed
Jun 3, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,510 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,510 words
Add missing citations, update stale details, or suggest a clearer explanation.
AlphaDev is an artificial intelligence system built by Google DeepMind that used deep reinforcement learning to discover faster algorithms for common computing tasks, most notably small-scale sorting and hashing routines. It was introduced in a paper titled "Faster sorting algorithms discovered using deep reinforcement learning," published in Nature on 7 June 2023 [1][2]. Rather than improve algorithms at the level of source code, AlphaDev searched directly in CPU assembly instructions, treating the construction of a correct, low-latency routine as a single-player game. Several of the sorting routines it found were reverse engineered into C++ and merged into the LLVM libc++ standard library, where they replaced code that had stood for more than a decade [3][4].
AlphaDev grew out of DeepMind's line of game-playing reinforcement learning agents. Its direct ancestor is AlphaZero, the system that learned to play Go, chess, and shogi from scratch through self-play, and which itself followed AlphaGo and was a sibling of the model-based agent MuZero [1][5]. The same research group had also shown, with AlphaTensor, that the AlphaZero approach could be turned on a problem in pure computation rather than a board game, in that case finding faster ways to multiply matrices. AlphaDev extended this idea further down the stack, to the individual machine instructions a processor executes.
The motivation was practical. Sorting and hashing sit underneath enormous amounts of everyday software, from database queries to web page rendering, and even small efficiency gains compound across the trillions of times these operations run each day [3]. Sorting routines for very short inputs are especially important because larger sorting algorithms repeatedly fall back to them as base cases, so a saved instruction in a three- or five-element sort can ripple through the performance of sorting at any scale [3].
AlphaDev frames algorithm discovery as a single-player game its authors called AssemblyGame [1]. At each move the agent appends one low-level assembly instruction to a growing program. The state combines the current sequence of instructions with the contents of the processor's registers and memory, encoded for the network using a Transformer-based representation of the assembly together with a representation of the machine state [1][6]. From a given state the agent could choose among a large but bounded set of legal instructions; the researchers reported that learning slowed considerably beyond about 297 instructions in the action set and programs longer than roughly 130 instructions [6].
The reward has two parts. Correctness is measured by running the candidate program on test inputs and checking that the outputs are actually sorted, and latency is measured either by counting instructions or by timing execution, so the agent is pushed toward programs that are both right and fast [1]. Because a single wrong instruction can break the whole routine, the search space is sparse and unforgiving, which is part of what made the problem a good match for a Monte Carlo tree search agent in the AlphaZero mold [1][7]. Working at the assembly level let AlphaDev consider instruction sequences that a human writing in C++ would not naturally express, and it occasionally found counterintuitive moves reminiscent of the surprising play that earlier DeepMind agents produced in board games [5].
For the fixed-length sorting routines, AlphaDev matched or beat the human benchmarks measured in number of instructions [6]:
| Routine | Human benchmark | AlphaDev | Instructions saved |
|---|---|---|---|
| Sort 3 | 18 | 17 | 1 |
| Sort 4 | 28 | 28 | 0 |
| Sort 5 | 46 | 42 | 4 |
The gains were larger for the variable sorts, which handle inputs whose length is not known ahead of time [6]:
| Routine | Human benchmark | AlphaDev | Instructions shorter |
|---|---|---|---|
| VarSort3 | 33 | 21 | 12 |
| VarSort4 | 66 | 37 | 29 |
| VarSort5 | 115 | 63 | 52 |
Along the way the agent discovered two reusable patterns the authors named the AlphaDev swap move and the AlphaDev copy move. Each exploits an ordering already guaranteed by earlier comparisons to drop a redundant instruction; the swap move, for example, lets a routine compute a two-way minimum where a three-way minimum had previously been used, saving one instruction every time the pattern appears [3][6].
DeepMind reported the sorting improvements as up to about 70% faster for short sequences and roughly 1.7% faster for sequences longer than 250,000 elements relative to the routines then shipping in the C++ library [3][4]. The company also applied AlphaDev to hashing, a step used to index data structures, and found a routine for inputs in the 9 to 16 byte range that was about 30% faster than the existing implementation [3][8].
The sorting work moved into production code before the paper appeared. The branchless sort3, sort4, and sort5 routines were submitted as a patch to LLVM's libc++ by DeepMind engineer Marco Gelmi, authored on 24 January 2022 and committed on 8 April 2022 [9]. The patch description states plainly that the functions "have been generated using Reinforcement Learning," and it applies them conservatively, only for contiguous iterators over arithmetic types that fit in a machine word and with simple comparators such as std::less, to avoid slowing down more complex cases [9]. Because AlphaDev produced assembly, the routines were reverse engineered back into C++ so they could be reviewed and maintained like ordinary library code [3].
DeepMind described this as the first change to that part of the libc++ sorting library in over a decade, and the first time a sorting routine in the standard library was generated through reinforcement learning [3][4]. The hashing result followed a similar path. The new 9 to 16 byte hashing function was released into Abseil, Google's open-source collection of C++ libraries, in early 2023 [3][8]. Through these two libraries, which are used by very large numbers of developers and systems, DeepMind estimated the resulting code runs trillions of times a day [3][2].
Coverage of AlphaDev framed it as an example of AI contributing to the foundations of computing rather than to a flashy end-user product. MIT Technology Review placed it in the lineage of DeepMind's game-playing systems and quoted lead researcher Daniel Mankowitz saying the team "honestly didn't expect to achieve anything better" than the heavily optimized routines already in use [5]. Outlets including VentureBeat and Tech Xplore emphasized that because the routines were already merged into a standard library, the improvements were live in production code rather than a laboratory demonstration [3][4].
Independent technical analysis was more measured about exactly what had been improved. Justine Tunney's widely read write-up examined the libc++ changes and argued that the practical speedups came as much from making the small sorts branchless, using conditional-move instructions, as from any single instruction the agent removed, and noted that branchless small sorts were an idea with prior history [10]. The discussion underlined a recurring theme: AlphaDev's contribution was a genuine, shippable optimization, but one whose real-world benefit is entangled with low-level details of how modern processors execute code.
AlphaDev is significant less for the size of any one speedup than for where it intervened. By searching at the level of assembly instructions and validating its programs against both correctness and measured latency, it showed that a reinforcement learning agent could find and ship improvements to code that sits at the very bottom of the software stack, code that millions of programs depend on without ever inspecting. It extended the AlphaZero recipe from games and matrix multiplication to the discovery of practical algorithms, and it produced reusable instruction-level patterns that humans could understand and reapply [3][6].
At the same time the project illustrated the limits of the approach. Learning degraded as the action set and program length grew, the agent did better on the shortest routines than on longer ones, and turning its assembly output into maintainable library code required human reverse engineering and conservative deployment [6][9]. As a proof that machine-discovered algorithms can quietly enter the infrastructure most of computing runs on, however, AlphaDev remains a notable milestone for DeepMind and for AI applied to systems software.