Pluribus (poker AI)
Last reviewed
Jun 3, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,696 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,696 words
Add missing citations, update stale details, or suggest a clearer explanation.
Pluribus is an artificial intelligence program that defeated elite human professionals at six-player no-limit Texas hold'em, the most popular form of poker played by people. Built by Noam Brown of Facebook AI Research and Tuomas Sandholm of Carnegie Mellon University, it was the first bot to reach superhuman performance in a widely recognized benchmark game with more than two players. The result was published in the journal Science on 11 July 2019 [1][2].
Multiplayer poker had stood as an open problem for decades. Most prior milestones in game-playing AI, including chess, Go, and two-player poker, involve either two players or two teams competing in a zero-sum setting. In such games a Nash equilibrium strategy cannot lose in expectation, which gives a clear target for a solver to approximate. With three or more independent players the guarantee breaks down: computing a Nash equilibrium is generally intractable, and even if one strategy could be computed, playing it provides no assurance of winning, because the other players may coordinate or simply play in ways the equilibrium does not anticipate. Pluribus sidesteps the search for an exact equilibrium and instead aims for a strategy that empirically beats strong humans [1][3].
Brown and Sandholm had already built Libratus, a bot that decisively beat four top human specialists at heads-up (two-player) no-limit Texas hold'em in the January 2017 "Brains vs. Artificial Intelligence" challenge in Pittsburgh. That match ran 120,000 hands over 20 days against pros Jason Les, Dong Kim, Daniel McAulay, and Jimmy Chou, and Libratus won by 147 milli big blinds per game (mbb/game) with a p-value of 0.0002, finishing roughly $1.77 million ahead in chips [4][5]. Libratus relied on very large computing resources, including a supercomputer at the Pittsburgh Supercomputing Center.
Pluribus extended this line of work to the harder multiplayer setting while drastically cutting the cost. Brown completed the project as a Ph.D. student at Carnegie Mellon while working as a research scientist at Facebook AI [1][2].
Pluribus combines two ideas: an offline self-play phase that produces a coarse "blueprint" strategy, and a real-time search phase that refines that strategy during play.
In the offline phase, Pluribus learns by self-play, playing trillions of hands against copies of itself with no human game data and no prior-bot data used as input [3][6]. The learning algorithm is a variant of Monte Carlo counterfactual regret minimization (MCCFR), an iterative method that samples paths through the game tree rather than traversing the whole tree on every iteration, gradually reducing the "regret" of not having chosen better actions [3][6]. To make the enormous game tractable, similar decision points and similar bet sizes are grouped together through abstraction (action abstraction and information abstraction), so the program reasons over a compressed version of the game [6].
The blueprint alone is intentionally coarse, especially after the first betting round. During live play Pluribus runs a depth-limited search to compute a finer-grained strategy for the situation actually in front of it. Rather than searching to the end of the game, which is infeasible, the search stops at a limited depth and assumes that, beyond that point, each player could continue with one of several different strategies. Considering these multiple continuation strategies, rather than a single fixed one, keeps Pluribus from becoming predictable and is a key reason the search works in an imperfect-information game [3][6]. Pluribus also computes how it would act with every possible hand it could hold, which balances its play so opponents cannot read its actual cards from its bets [6].
A central point of the work is how little it cost. The blueprint was trained on a 64-core server in about 8 days, using a total of about 12,400 CPU core hours and less than 512 GB of RAM, with no GPUs involved. At then-current cloud rates the researchers estimated the training would cost roughly $144 to reproduce [3][6]. During live play Pluribus ran on a far more modest machine with two Intel Haswell E5-2695 v3 CPUs and less than 128 GB of memory, taking between 1 and 33 seconds per decision and about 20 seconds per hand, which is roughly twice as fast as professional humans tend to play [6][7].
| Phase | Hardware | Time | Memory | Estimated cost |
|---|---|---|---|---|
| Blueprint training | 64-core server, no GPU | About 8 days (around 12,400 CPU core hours) | Less than 512 GB | About $144 of cloud compute |
| Live play | Two Intel E5-2695 v3 CPUs | 1 to 33 seconds per decision (about 20 seconds per hand) | Less than 128 GB | N/A |
Brown framed the economy of the approach as a deliberate counterpoint to the trend toward ever-larger compute budgets: "Some experts in the field have worried that future AI research will be dominated by large teams with access to millions of dollars in computing resources. We believe Pluribus is powerful evidence that novel approaches that require only modest resources can drive cutting-edge AI research." [7]
Pluribus was tested against professionals in two formats, each designed to isolate its performance from the natural swings of poker. The researchers used a variance-reduction technique called AIVAT to lower the role of luck in the measured win rate [1][3].
In the first format, one copy of Pluribus played against five human professionals at a time. Five players were drawn for each session from a pool of professionals, each of whom had won more than $1 million playing poker. Over 10,000 hands, Pluribus won an average of 48 mbb/game with a standard error of 25 mbb/game (p-value 0.028) [3][6]. In the second format, the roles were reversed: a single elite human played against five copies of Pluribus. Two of the strongest players in the pool, Darren Elias (holder of the record for most World Poker Tour titles) and Chris "Jesus" Ferguson (winner of six World Series of Poker events), each played 5,000 hands in this setting. Pluribus won by 32 mbb/game with a standard error of 15 mbb/game (p-value 0.014) [3][6]. The win rate corresponds to roughly $5 per hand, or close to $1,000 per hour at the pace played, although no real money was wagered between Pluribus and the humans [6].
| Experiment | Configuration | Hands | Win rate (after AIVAT) | Statistical significance |
|---|---|---|---|---|
| 5 humans, 1 Pluribus | One bot against five pros at a time | 10,000 | 48 mbb/game (std. error 25) | p = 0.028 |
| 1 human, 5 Pluribus | One pro against five bots | 5,000 each | 32 mbb/game (std. error 15) | p = 0.014 |
The full set of professionals included Jimmy Chou, Seth Davies, Darren Elias, Chris Ferguson, Michael Gagliano, Anthony Gregg, Dong Kim, Jason Les, Linus Loeliger, Daniel McAulay, Greg Merson, Nick Petrangelo, Sean Ruane, Trevor Savage, and Jacob Toole [6]. The humans were paid for participation, with bonuses tied to performance against the bot, to keep their incentives aligned with playing their best.
Players singled out the bot's unconventional bet sizing and aggression. Michael Gagliano observed, "There were several plays that humans simply are not making at all, especially relating to its bet sizing." Jason Les called it "an absolute monster bluffer," adding "it's a much more efficient bluffer than most humans." Chris Ferguson noted, "It's really hard to pin him down on any kind of hand," praising its thin value bets on the river [6]. Commentators highlighted Pluribus's willingness to use very large overbets for both bluffs and value, and its use of "donk betting" (leading into the previous round's aggressor), a play many human pros avoid [6][8].
Sandholm summarized the achievement: "Pluribus achieved superhuman performance at multiplayer poker, which is a recognized milestone in artificial intelligence and in game theory that has been open for decades." [2]
Pluribus showed that strong play in a large, multiplayer, imperfect-information game does not require equilibrium guarantees or massive compute. Its core technique, a cheaply trained blueprint refined by depth-limited search with multiple continuation strategies, generalized beyond poker. Noam Brown and collaborators applied closely related ideas about search and self-play in later work at Facebook AI, including the bot ReBeL and, most prominently, CICERO, the first AI to reach human-level performance in the strategy game Diplomacy by combining a strategic-reasoning planning module with a natural-language model [9]. Diplomacy is a seven-player game that mixes cooperation, negotiation, and competition, extending the multiplayer, non-zero-sum challenges that Pluribus first confronted in poker. Brown has cited Pluribus when arguing about the value of search and planning at inference time, themes he continued to pursue after moving to OpenAI to work on reasoning systems [9].
Years later, Brown also offered Pluribus as a cautionary example, noting that its very low training cost partly reflected how heavily the method was tuned to the specific benchmark of poker, a lesson he related to the risk of over-optimizing AI systems for narrow benchmarks [10].