CICERO (AI)

AI Agents AI in Gaming Meta AI

8 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

14 citations

Revision

v2 · 1,559 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

CICERO is an AI agent built by Meta AI's Fundamental AI Research (FAIR) division that reached human-level performance in the board game Diplomacy. Meta announced it on 22 November 2022, alongside a paper in the journal Science titled "Human-level play in the game of Diplomacy by combining language models with strategic reasoning." ^[1]^[2] The system is notable because Diplomacy demands both free-form natural-language negotiation and long-horizon strategic planning, a combination that earlier game AI milestones in chess, Go, and poker did not have to handle together. In anonymous online play on the webDiplomacy.net platform, CICERO finished with more than double the average human score and placed in roughly the top 10% of players who had played more than one game. ^[1]^[3]

Why Diplomacy is hard for AI

Diplomacy is a seven-player game set in pre-World War I Europe. Players control the armies and fleets of the major powers (England, France, Germany, Italy, Austria-Hungary, Russia, and Turkey) and compete to control a majority of supply centers on the map. The game has no dice and no hidden board information: every player can see the full position. What makes it distinctive is that all units move simultaneously each turn, and the turn is preceded by a negotiation phase in which players exchange private messages to form alliances, coordinate attacks, and bargain over territory. ^[1]^[4]

This structure poses problems that purely competitive, perfect-information games like Go do not. Success requires cooperation as well as competition: a player almost always needs allies to make progress, yet alliances are temporary and any partner may betray you. Most of the coordination happens through unrestricted natural-language conversation rather than through formal moves, so an effective agent has to understand and produce persuasive, context-aware dialogue while also reasoning about what other players will do. Meta's FAIR team described the central difficulty as blending natural language processing with strategic reasoning under mixed motives. ^[1]^[2] These features rule out the self-play-only recipe that worked for two-player zero-sum games, because an agent trained only against copies of itself would not learn to talk to, or model, the idiosyncratic humans it must cooperate with.

Architecture

CICERO couples two components: a strategic-reasoning (planning) module that decides what to do, and a controllable dialogue model that decides what to say. The link between them is a representation the authors call an "intent," a set of intended moves for CICERO and for the player it is talking to. The planner produces intents from the board state and the conversation so far; the dialogue model is then conditioned on those intents so that its messages stay grounded in, and consistent with, the actual plan. ^[1]^[3]

Module	Role	Approach
Strategic reasoning / planning	Predict other players' likely moves and choose CICERO's own actions	A planning and search algorithm called piKL that refines policy predictions toward higher expected value while staying close to human-like behavior, building on Meta's earlier Diplomacy and game-theoretic reinforcement learning work ^[1]^[3]
Controllable dialogue	Generate negotiation messages grounded in the current plan	A large language model conditioned on the game state and the planned intents, with filters to suppress nonsensical or off-strategy messages ^[1]^[2]

Strategic reasoning

The planning module predicts a policy (a distribution over possible move sets) for every player on the current turn, drawing on both the board position and the dialogue CICERO has exchanged. It also predicts what other players believe CICERO's own policy to be. The piKL algorithm then iteratively improves these predictions: it searches for policies with higher expected value given the other players' predicted policies, while penalizing departures from the original, human-trained predictions. Keeping the policy close to human behavior matters because CICERO must play with people who do not behave like a self-play optimum, so a policy that better models actual human play also produces better cooperation. ^[1]^[3] This line of work followed Meta's earlier no-press (dialogue-free) Diplomacy research, including the "Mastering the Game of No-Press Diplomacy" results later presented at ICLR 2023. ^[5]

Controllable dialogue

The dialogue model is built on R2C2, a 2.7-billion-parameter Transformer encoder-decoder language model in the BART family, pretrained on text from the internet. Meta fine-tuned it on more than 40,000 human games from webDiplomacy.net, a corpus containing well over twelve million in-game messages. ^[1]^[3]^[6] To make the model controllable rather than free-floating, the team automatically annotated training messages with the moves they corresponded to, so that at inference time generation could be steered toward discussing a specific desired action, the intent supplied by the planner. The pipeline also applied filtering, including classifiers that distinguish human from model-generated text, to discard messages that were inconsistent, ungrounded, or strategically unsound before sending them. ^[1]^[2] In a typical game CICERO sent and received around 292 messages, often using the slang and shorthand human Diplomacy players use. ^[6]^[7]

Results on webDiplomacy

Meta evaluated CICERO by entering it, anonymously, into an online blitz Diplomacy league on webDiplomacy.net, where it played against unsuspecting human opponents. Across 40 games CICERO achieved an average score of 25.8%, more than double the 12.4% average of the 82 human players it faced, and ranked in roughly the top 10% of participants who played more than one game. In one eight-game tournament within that play it finished first among 21 entrants. ^[1]^[3]^[8] Players are not normally told that an opponent is a bot, and the team reported that CICERO's identity was generally not detected during the games.

Metric	CICERO	Human players
Average score across 40 games	25.8%	12.4%
Overall ranking	Top ~10% (of players with more than one game)	N/A
Number of human opponents faced	N/A	82
Messages per game (sent and received)	~292	N/A

The "human-level" claim is therefore specific: it refers to anonymous online blitz play, not to formal face-to-face tournament Diplomacy, and later independent analyses (for example a 2024 study revisiting CICERO's games) examined how its cooperative behavior compared with that of strong human players in more detail. ^[9]

Open source and reception

Meta released CICERO's code, models, and research data so that other researchers could build on the work. The team stated, "By open-sourcing the code and models we hope that AI researchers can continue to build off our work in a responsible manner." ^[2] The public repository, facebookresearch/diplomacy_cicero, distributes the training and inference code under the MIT License (with one component under AGPL) and the model weights under a separate CC-BY-NC 4.0 license; it was archived as read-only in 2025. ^[10] Yann LeCun, Meta's chief AI scientist, called the result "a true breakthrough for cooperative AI." ^[1] Commentators also noted the project's safety dimension: because Diplomacy rewards persuasion and sometimes deception, the team designed CICERO to be largely honest and consistent with its plans, and discussed the dual-use questions that negotiating agents raise. ^[2]^[11]

Relation to Meta's other game AI work

CICERO sits in a line of research on AI for games that mix competition with imperfect information and many players. Several of its researchers, including Noam Brown, had previously co-created the poker agents Libratus and Pluribus. Libratus defeated top professionals at two-player no-limit Texas Hold'em in 2017, and Pluribus, a collaboration between Carnegie Mellon University and Facebook AI, beat elite professionals at six-player no-limit poker in 2019. ^[12]^[13] Those systems showed that search and equilibrium-finding could reach superhuman play in hidden-information and multi-agent settings, but they operated entirely through betting actions. CICERO extended the same emphasis on planning and search into a domain where the primary medium of interaction is open-ended human language, joining the planner to a language model rather than relying on game actions alone. The Diplomacy work is frequently cited as an early demonstration that strategic reasoning and natural-language dialogue could be unified in one agent, a theme that carried into later work on reasoning in language models. ^[1]^[14]

References

Meta AI, "CICERO," research project page. https://ai.meta.com/research/cicero/ ↩
Meta AI, "CICERO: An AI agent that negotiates, persuades, and cooperates with people," blog, 22 November 2022. https://ai.meta.com/blog/cicero-ai-negotiates-persuades-and-cooperates-with-people/ ↩
Meta Fundamental AI Research Diplomacy Team (FAIR) et al., "Human-level play in the game of Diplomacy by combining language models with strategic reasoning," *Science*, vol. 378, no. 6624, 2022. https://www.science.org/doi/10.1126/science.ade9097 ↩
Meta AI, "Diplomacy and CICERO." https://ai.meta.com/research/cicero/diplomacy/ ↩
A. Bakhtin et al., "Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning," ICLR 2023. https://arxiv.org/abs/2210.05492 ↩
Meta Newsroom, "CICERO: AI That Can Collaborate and Negotiate With You," 22 November 2022. https://about.fb.com/news/2022/11/cicero-ai-that-can-collaborate-and-negotiate-with-you/ ↩
T. Picciano, "Machine Learning: CICERO, AI Program Learns the Art of Diplomacy," 25 November 2022. https://apicciano.commons.gc.cuny.edu/2022/11/25/machine-learning-cicero-ai-program-learns-the-art-of-diplomacy/ ↩
A. Alford, "Meta's CICERO AI Wins Online Diplomacy Tournament," InfoQ, December 2022. https://www.infoq.com/news/2022/12/meta-diplomacy-cicero/ ↩
W. Mukobi et al., "More Victories, Less Cooperation: Assessing Cicero's Diplomacy Play," arXiv, 2024. https://arxiv.org/abs/2406.04643 ↩
facebookresearch, "diplomacy_cicero," GitHub repository. https://github.com/facebookresearch/diplomacy_cicero ↩
T. Simonite / The Register, "Meta's Cicero chatbot can probably beat you at Diplomacy," 23 November 2022. https://www.theregister.com/2022/11/23/metas_cicero_chatbot_can_probably/ ↩
Carnegie Mellon University, "Carnegie Mellon and Facebook AI Beats Professionals in Six-Player Poker," 11 July 2019. https://www.cs.cmu.edu/news/2019/carnegie-mellon-and-facebook-ai-beats-professionals-six-player-poker ↩
N. Brown and T. Sandholm, "Superhuman AI for heads-up no-limit poker: Libratus beats top professionals," *Science*, 2017. https://noambrown.com/papers/17-Science-Superhuman.pdf ↩
Noam Brown, personal website (research overview). https://noambrown.github.io/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Pluribus (poker AI)

Why Diplomacy is hard for AI

Architecture

Strategic reasoning

Controllable dialogue

Results on webDiplomacy

Open source and reception

Relation to Meta's other game AI work

References

Improve this article

Related Articles

Pluribus (poker AI)

SIMA (DeepMind)

Voyager (Minecraft LLM agent)

Moltbook

Toolformer

Gaming

What links here

Related Articles

Pluribus (poker AI)

SIMA (DeepMind)

Voyager (Minecraft LLM agent)

Moltbook

Toolformer

Gaming