CICERO (AI)
Last reviewed
Jun 3, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 · 1,562 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 · 1,562 words
Add missing citations, update stale details, or suggest a clearer explanation.
CICERO is an AI agent built by Meta AI's Fundamental AI Research (FAIR) division that reached human-level performance in the board game Diplomacy. Meta announced it on 22 November 2022, alongside a paper in the journal Science titled "Human-level play in the game of Diplomacy by combining language models with strategic reasoning." [1][2] The system is notable because Diplomacy demands both free-form natural-language negotiation and long-horizon strategic planning, a combination that earlier game AI milestones in chess, Go, and poker did not have to handle together. In anonymous online play on the webDiplomacy.net platform, CICERO finished with more than double the average human score and placed in roughly the top 10% of players who had played more than one game. [1][3]
Diplomacy is a seven-player game set in pre-World War I Europe. Players control the armies and fleets of the major powers (England, France, Germany, Italy, Austria-Hungary, Russia, and Turkey) and compete to control a majority of supply centers on the map. The game has no dice and no hidden board information: every player can see the full position. What makes it distinctive is that all units move simultaneously each turn, and the turn is preceded by a negotiation phase in which players exchange private messages to form alliances, coordinate attacks, and bargain over territory. [1][4]
This structure poses problems that purely competitive, perfect-information games like Go do not. Success requires cooperation as well as competition: a player almost always needs allies to make progress, yet alliances are temporary and any partner may betray you. Most of the coordination happens through unrestricted natural-language conversation rather than through formal moves, so an effective agent has to understand and produce persuasive, context-aware dialogue while also reasoning about what other players will do. Meta's FAIR team described the central difficulty as blending natural language processing with strategic reasoning under mixed motives. [1][2] These features rule out the self-play-only recipe that worked for two-player zero-sum games, because an agent trained only against copies of itself would not learn to talk to, or model, the idiosyncratic humans it must cooperate with.
CICERO couples two components: a strategic-reasoning (planning) module that decides what to do, and a controllable dialogue model that decides what to say. The link between them is a representation the authors call an "intent," a set of intended moves for CICERO and for the player it is talking to. The planner produces intents from the board state and the conversation so far; the dialogue model is then conditioned on those intents so that its messages stay grounded in, and consistent with, the actual plan. [1][3]
| Module | Role | Approach |
|---|---|---|
| Strategic reasoning / planning | Predict other players' likely moves and choose CICERO's own actions | A planning and search algorithm called piKL that refines policy predictions toward higher expected value while staying close to human-like behavior, building on Meta's earlier Diplomacy and game-theoretic reinforcement learning work [1][3] |
| Controllable dialogue | Generate negotiation messages grounded in the current plan | A large language model conditioned on the game state and the planned intents, with filters to suppress nonsensical or off-strategy messages [1][2] |
The planning module predicts a policy (a distribution over possible move sets) for every player on the current turn, drawing on both the board position and the dialogue CICERO has exchanged. It also predicts what other players believe CICERO's own policy to be. The piKL algorithm then iteratively improves these predictions: it searches for policies with higher expected value given the other players' predicted policies, while penalizing departures from the original, human-trained predictions. Keeping the policy close to human behavior matters because CICERO must play with people who do not behave like a self-play optimum, so a policy that better models actual human play also produces better cooperation. [1][3] This line of work followed Meta's earlier no-press (dialogue-free) Diplomacy research, including the "Mastering the Game of No-Press Diplomacy" results later presented at ICLR 2023. [5]
The dialogue model is built on R2C2, a 2.7-billion-parameter Transformer encoder-decoder language model in the BART family, pretrained on text from the internet. Meta fine-tuned it on more than 40,000 human games from webDiplomacy.net, a corpus containing well over twelve million in-game messages. [1][3][6] To make the model controllable rather than free-floating, the team automatically annotated training messages with the moves they corresponded to, so that at inference time generation could be steered toward discussing a specific desired action, the intent supplied by the planner. The pipeline also applied filtering, including classifiers that distinguish human from model-generated text, to discard messages that were inconsistent, ungrounded, or strategically unsound before sending them. [1][2] In a typical game CICERO sent and received around 292 messages, often using the slang and shorthand human Diplomacy players use. [6][7]
Meta evaluated CICERO by entering it, anonymously, into an online blitz Diplomacy league on webDiplomacy.net, where it played against unsuspecting human opponents. Across 40 games CICERO achieved an average score of 25.8%, more than double the 12.4% average of the 82 human players it faced, and ranked in roughly the top 10% of participants who played more than one game. In one eight-game tournament within that play it finished first among 21 entrants. [1][3][8] Players are not normally told that an opponent is a bot, and the team reported that CICERO's identity was generally not detected during the games.
| Metric | CICERO | Human players |
|---|---|---|
| Average score across 40 games | 25.8% | 12.4% |
| Overall ranking | Top ~10% (of players with more than one game) | N/A |
| Number of human opponents faced | N/A | 82 |
| Messages per game (sent and received) | ~292 | N/A |
The "human-level" claim is therefore specific: it refers to anonymous online blitz play, not to formal face-to-face tournament Diplomacy, and later independent analyses (for example a 2024 study revisiting CICERO's games) examined how its cooperative behavior compared with that of strong human players in more detail. [9]
Meta released CICERO's code, models, and research data so that other researchers could build on the work. The team stated, "By open-sourcing the code and models we hope that AI researchers can continue to build off our work in a responsible manner." [2] The public repository, facebookresearch/diplomacy_cicero, distributes the training and inference code under the MIT License (with one component under AGPL) and the model weights under a separate CC-BY-NC 4.0 license; it was archived as read-only in 2025. [10] Yann LeCun, Meta's chief AI scientist, called the result "a true breakthrough for cooperative AI." [1] Commentators also noted the project's safety dimension: because Diplomacy rewards persuasion and sometimes deception, the team designed CICERO to be largely honest and consistent with its plans, and discussed the dual-use questions that negotiating agents raise. [2][11]
CICERO sits in a line of research on AI for games that mix competition with imperfect information and many players. Several of its researchers, including Noam Brown, had previously co-created the poker agents Libratus and Pluribus. Libratus defeated top professionals at two-player no-limit Texas Hold'em in 2017, and Pluribus, a collaboration between Carnegie Mellon University and Facebook AI, beat elite professionals at six-player no-limit poker in 2019. [12][13] Those systems showed that search and equilibrium-finding could reach superhuman play in hidden-information and multi-agent settings, but they operated entirely through betting actions. CICERO extended the same emphasis on planning and search into a domain where the primary medium of interaction is open-ended human language, joining the planner to a language model rather than relying on game actions alone. The Diplomacy work is frequently cited as an early demonstration that strategic reasoning and natural-language dialogue could be unified in one agent, a theme that carried into later work on reasoning in language models. [1][14]