# Pluribus (poker AI)

> Source: https://aiwiki.ai/wiki/pluribus
> Updated: 2026-06-03
> Categories: AI in Gaming, Meta AI, Reinforcement Learning
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

# Pluribus (poker AI)

**Pluribus** is an [artificial intelligence](/wiki/artificial_intelligence) program that defeated elite human professionals at six-player no-limit Texas hold'em, the most popular form of poker played by people. Built by Noam Brown of Facebook AI Research and Tuomas Sandholm of [Carnegie Mellon University](/wiki/carnegie_mellon_university), it was the first bot to reach superhuman performance in a widely recognized benchmark [game](/wiki/game_ai) with more than two players. The result was published in the journal *Science* on 11 July 2019 [1][2].

Multiplayer poker had stood as an open problem for decades. Most prior milestones in game-playing AI, including [chess](/wiki/deep_blue), [Go](/wiki/alphago), and two-player poker, involve either two players or two teams competing in a zero-sum setting. In such games a [Nash equilibrium](/wiki/nash_equilibrium) strategy cannot lose in expectation, which gives a clear target for a solver to approximate. With three or more independent players the guarantee breaks down: computing a Nash equilibrium is generally intractable, and even if one strategy could be computed, playing it provides no assurance of winning, because the other players may coordinate or simply play in ways the equilibrium does not anticipate. Pluribus sidesteps the search for an exact equilibrium and instead aims for a strategy that empirically beats strong humans [1][3].

## Background: from Libratus to Pluribus

Brown and Sandholm had already built [Libratus](/wiki/libratus), a bot that decisively beat four top human specialists at heads-up (two-player) no-limit Texas hold'em in the January 2017 "Brains vs. Artificial Intelligence" challenge in Pittsburgh. That match ran 120,000 hands over 20 days against pros Jason Les, Dong Kim, Daniel McAulay, and Jimmy Chou, and Libratus won by 147 milli big blinds per game (mbb/game) with a p-value of 0.0002, finishing roughly $1.77 million ahead in chips [4][5]. Libratus relied on very large computing resources, including a supercomputer at the Pittsburgh Supercomputing Center.

Pluribus extended this line of work to the harder multiplayer setting while drastically cutting the cost. Brown completed the project as a Ph.D. student at Carnegie Mellon while working as a research scientist at Facebook AI [1][2].

## Method

Pluribus combines two ideas: an offline self-play phase that produces a coarse "blueprint" strategy, and a real-time search phase that refines that strategy during play.

In the offline phase, Pluribus learns by [self-play](/wiki/self_play), playing trillions of hands against copies of itself with no human game data and no prior-bot data used as input [3][6]. The learning algorithm is a variant of Monte Carlo counterfactual regret minimization (MCCFR), an iterative method that samples paths through the game tree rather than traversing the whole tree on every iteration, gradually reducing the "regret" of not having chosen better actions [3][6]. To make the enormous game tractable, similar decision points and similar bet sizes are grouped together through abstraction (action abstraction and information abstraction), so the program reasons over a compressed version of the game [6].

The blueprint alone is intentionally coarse, especially after the first betting round. During live play Pluribus runs a depth-limited search to compute a finer-grained strategy for the situation actually in front of it. Rather than searching to the end of the game, which is infeasible, the search stops at a limited depth and assumes that, beyond that point, each player could continue with one of several different strategies. Considering these multiple continuation strategies, rather than a single fixed one, keeps Pluribus from becoming predictable and is a key reason the search works in an imperfect-information game [3][6]. Pluribus also computes how it would act with every possible hand it could hold, which balances its play so opponents cannot read its actual cards from its bets [6].

## Cost and hardware

A central point of the work is how little it cost. The blueprint was trained on a 64-core server in about 8 days, using a total of about 12,400 CPU core hours and less than 512 GB of RAM, with no [GPUs](/wiki/gpu) involved. At then-current cloud rates the researchers estimated the training would cost roughly $144 to reproduce [3][6]. During live play Pluribus ran on a far more modest machine with two Intel Haswell E5-2695 v3 CPUs and less than 128 GB of memory, taking between 1 and 33 seconds per decision and about 20 seconds per hand, which is roughly twice as fast as professional humans tend to play [6][7].

| Phase | Hardware | Time | Memory | Estimated cost |
|---|---|---|---|---|
| Blueprint training | 64-core server, no GPU | About 8 days (around 12,400 CPU core hours) | Less than 512 GB | About $144 of cloud compute |
| Live play | Two Intel E5-2695 v3 CPUs | 1 to 33 seconds per decision (about 20 seconds per hand) | Less than 128 GB | N/A |

Brown framed the economy of the approach as a deliberate counterpoint to the trend toward ever-larger compute budgets: "Some experts in the field have worried that future AI research will be dominated by large teams with access to millions of dollars in computing resources. We believe Pluribus is powerful evidence that novel approaches that require only modest resources can drive cutting-edge AI research." [7]

## Evaluation and results

Pluribus was tested against professionals in two formats, each designed to isolate its performance from the natural swings of poker. The researchers used a variance-reduction technique called AIVAT to lower the role of luck in the measured win rate [1][3].

In the first format, one copy of Pluribus played against five human professionals at a time. Five players were drawn for each session from a pool of professionals, each of whom had won more than $1 million playing poker. Over 10,000 hands, Pluribus won an average of 48 mbb/game with a standard error of 25 mbb/game (p-value 0.028) [3][6]. In the second format, the roles were reversed: a single elite human played against five copies of Pluribus. Two of the strongest players in the pool, Darren Elias (holder of the record for most World Poker Tour titles) and Chris "Jesus" Ferguson (winner of six World Series of Poker events), each played 5,000 hands in this setting. Pluribus won by 32 mbb/game with a standard error of 15 mbb/game (p-value 0.014) [3][6]. The win rate corresponds to roughly $5 per hand, or close to $1,000 per hour at the pace played, although no real money was wagered between Pluribus and the humans [6].

| Experiment | Configuration | Hands | Win rate (after AIVAT) | Statistical significance |
|---|---|---|---|---|
| 5 humans, 1 Pluribus | One bot against five pros at a time | 10,000 | 48 mbb/game (std. error 25) | p = 0.028 |
| 1 human, 5 Pluribus | One pro against five bots | 5,000 each | 32 mbb/game (std. error 15) | p = 0.014 |

The full set of professionals included Jimmy Chou, Seth Davies, Darren Elias, Chris Ferguson, Michael Gagliano, Anthony Gregg, Dong Kim, Jason Les, Linus Loeliger, Daniel McAulay, Greg Merson, Nick Petrangelo, Sean Ruane, Trevor Savage, and Jacob Toole [6]. The humans were paid for participation, with bonuses tied to performance against the bot, to keep their incentives aligned with playing their best.

Players singled out the bot's unconventional bet sizing and aggression. Michael Gagliano observed, "There were several plays that humans simply are not making at all, especially relating to its bet sizing." Jason Les called it "an absolute monster bluffer," adding "it's a much more efficient bluffer than most humans." Chris Ferguson noted, "It's really hard to pin him down on any kind of hand," praising its thin value bets on the river [6]. Commentators highlighted Pluribus's willingness to use very large overbets for both bluffs and value, and its use of "donk betting" (leading into the previous round's aggressor), a play many human pros avoid [6][8].

Sandholm summarized the achievement: "Pluribus achieved superhuman performance at multiplayer poker, which is a recognized milestone in artificial intelligence and in game theory that has been open for decades." [2]

## Significance and relation to later work

Pluribus showed that strong play in a large, multiplayer, imperfect-information game does not require equilibrium guarantees or massive compute. Its core technique, a cheaply trained blueprint refined by depth-limited search with multiple continuation strategies, generalized beyond poker. Noam Brown and collaborators applied closely related ideas about search and self-play in later work at Facebook AI, including the bot [ReBeL](/wiki/rebel) and, most prominently, [CICERO](/wiki/cicero), the first AI to reach human-level performance in the strategy game [Diplomacy](/wiki/diplomacy_ai) by combining a strategic-reasoning planning module with a natural-language model [9]. Diplomacy is a seven-player game that mixes cooperation, negotiation, and competition, extending the multiplayer, non-zero-sum challenges that Pluribus first confronted in poker. Brown has cited Pluribus when arguing about the value of search and planning at inference time, themes he continued to pursue after moving to [OpenAI](/wiki/openai) to work on reasoning systems [9].

Years later, Brown also offered Pluribus as a cautionary example, noting that its very low training cost partly reflected how heavily the method was tuned to the specific benchmark of poker, a lesson he related to the risk of over-optimizing AI systems for narrow benchmarks [10].

## References

1. Brown, Noam; Sandholm, Tuomas. "Superhuman AI for multiplayer poker." *Science*, vol. 365, no. 6456, 2019, pp. 885-890. https://www.science.org/doi/10.1126/science.aay2400
2. "Carnegie Mellon and Facebook AI Beats Professionals in Six-Player Poker." Carnegie Mellon University, School of Computer Science, 11 July 2019. https://www.cs.cmu.edu/news/2019/carnegie-mellon-and-facebook-ai-beats-professionals-six-player-poker
3. "Pluribus (poker bot)." Wikipedia. https://en.wikipedia.org/wiki/Pluribus_(poker_bot)
4. "Carnegie Mellon Artificial Intelligence Beats Top Poker Pros." Carnegie Mellon University, 31 January 2017. https://www.cmu.edu/news/stories/archives/2017/january/AI-beats-poker-pros.html
5. Brown, Noam; Sandholm, Tuomas. "Superhuman AI for heads-up no-limit poker: Libratus beats top professionals." *Science*, vol. 359, no. 6374, 2018, pp. 418-424. https://www.science.org/doi/10.1126/science.aao1733
6. "Let's Read: Superhuman AI for multiplayer poker." LessWrong, 2019. https://www.lesswrong.com/posts/6qtq6KDvj86DXqfp6/let-s-read-superhuman-ai-for-multiplayer-poker
7. "How an ace-hole AI bot built by Facebook, CMU boffins whipped a table of human poker pros." The Register, 12 July 2019. https://www.theregister.com/2019/07/12/pluribus_ai_poker_human_pros/
8. "Poker Bot Pluribus First AI to Beat Humans in Multiplayer No-Limit Hold'em." PokerNews, 11 July 2019. https://www.pokernews.com/news/2019/07/pluribus-first-ai-to-beat-humans-in-multiplayer-no-limit-34910.htm
9. "CICERO: An AI agent that negotiates, persuades, and cooperates with people." Meta AI, 22 November 2022. https://ai.meta.com/blog/cicero-ai-negotiates-persuades-and-cooperates-with-people/
10. Brown, Noam (@polynoamial). "5 years ago we revealed Pluribus, the first superhuman multiplayer poker AI. It cost only $150 to train." X (formerly Twitter), 24 July 2024. https://x.com/polynoamial/status/1816347598623834365

