MetaGPT

AI Agents AI Code Generation Open Source AI

22 min read

Updated Jul 11, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 11, 2026

Fact-checked

In review queue

Sources

25 citations

Revision

v3 · 4,355 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

MetaGPT is an open-source multi-agent framework that organizes large language model agents into a simulated software-development company, with role-specialized agents (Product Manager, Architect, Project Manager, Engineer, and QA Engineer) collaborating through codified Standardized Operating Procedures (SOPs) to turn a single-line natural-language requirement into a complete software artifact set, including a product requirements document (PRD), system design, task list, source code, and tests.^[1] The framework was introduced in the paper MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework by Sirui Hong, Mingchen Zhuge, and colleagues at DeepWisdom and collaborating institutions, first posted to arXiv on 1 August 2023 (2308.00352) and later accepted as an oral presentation (the top 1.2% of submissions, ranked first in the LLM-based agent category) at the International Conference on Learning Representations 2024.^[2]^[3]^[10] MetaGPT's central thesis is that "Code = SOP(Team)": by encoding the standardized operating procedures of human software organizations into prompt sequences and structured artifacts, cascading hallucinations and miscoordination among chained LLM agents can be substantially reduced.^[1]^[4] The reference implementation is released on GitHub under the MIT License and, having grown from about 59,400 stars in November 2025 to more than 69,000 by mid-2026, ranks among the most popular multi-agent frameworks for software automation alongside AutoGen and CrewAI.^[5]^[6] The repository describes itself as "The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming."^[5] DeepWisdom subsequently commercialized the framework as MGX, launched February 2025 and rebranded Atoms in January 2026.^[7]^[8]

Infobox

Attribute	Value
Original name	MetaGPT
Type	Open-source LLM multi-agent framework
Domain	Automated software development, data science
Creator	DeepWisdom (Sirui Hong et al.)
First arXiv preprint	1 August 2023 (arXiv:2308.00352)^[2]
ICLR 2024 status	Oral presentation, top 1.2% of submissions, ranked #1 in the LLM-based agent category^[3]^[10]
Repository	github.com/FoundationAgents/MetaGPT (formerly geekan/MetaGPT)^[5]
License	MIT^[5]
Primary language	Python (approximately 97.5% of codebase)^[5]
Latest tagged release	v0.8.1 (22 April 2024)^[5]
GitHub stars	more than 69,000 (July 2026); about 59,400 in November 2025^[5]^[6]
Commercial product	MGX (19 February 2025), rebranded Atoms (13 January 2026)^[7]^[8]
Reported funding	approximately RMB 220 million (about USD 30.8 million); Series A and Series A+ led by Cathay Capital^[14]
Core philosophy	"Code = SOP(Team)"^[5]

History

Origins at DeepWisdom

MetaGPT originated in the research group led by Sirui Hong at DeepWisdom (also referred to in some Chinese sources as Fuzhi), a Shenzhen-based startup focused on autonomous agents.^[9] Hong is listed as co-founder of DeepWisdom and as a researcher who leads the MetaGPT team, with related work on agent platforms such as AgentStore.^[9] The project was bootstrapped in mid-2023 in response to a wave of single-agent autonomous systems including Auto-GPT and BabyAGI that, while compelling demos, exhibited brittle behavior on multi-step software tasks and frequently produced logically inconsistent output because each step had no enforced contract with its successor.^[4]

The initial public release of the repository under the GitHub handle geekan/MetaGPT occurred at the end of August 2023, immediately following the first arXiv submission of the paper on 1 August 2023.^[2]^[5] Within weeks the project trended on GitHub's monthly charts and within months it was included in the Open100 list of top open-source achievements for 2023.^[10]

Paper publication and revisions

The paper went through several revisions on arXiv between August 2023 and November 2024, with the seventh version (v7) representing the camera-ready ICLR 2024 manuscript.^[2] OpenReview records confirm acceptance as an oral presentation at ICLR 2024, with publication dated 16 January 2024 under submission number 5488.^[3] The author list expanded between preprint versions but the camera-ready paper credits fifteen authors: Sirui Hong, Mingchen Zhuge, Jiaqi Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber, with collaborating affiliations including The Chinese University of Hong Kong (Shenzhen) and KAUST (where Schmidhuber holds an appointment).^[11]

Subsequent work

After the foundational paper, the MetaGPT team published a series of extensions:

Data Interpreter (arXiv:2402.18679, first posted 28 February 2024) introduced a specialist agent focused on data-science workflows, using hierarchical graph modeling and programmable node generation for dynamic planning.^[12]
AFlow: Automating Agentic Workflow Generation (arXiv:2410.10762) reformulated agentic workflow design as a Monte Carlo Tree Search problem over code-represented workflows; it was accepted as an oral presentation at ICLR 2025.^[13]
MGX (MetaGPT X), a hosted commercial product built on the framework, launched on 19 February 2025 and was rebranded as Atoms in January 2026.^[7]^[8]
The GitHub organization was migrated from the personal handle geekan to the dedicated organization FoundationAgents in 2024 while keeping the original URL as a redirect.^[5]

By November 2025, the MetaGPT team reported on social media that the repository had crossed 59,400 stars, and the count climbed past 69,000 by mid-2026.^[5]^[6] Reporting by Korean and Chinese tech press described the project's commercial arm as having raised approximately RMB 220 million (about USD 30.8 million) across two 2025 rounds, a Series A and a Series A+ led by Cathay Capital and backed by Ant Group, Jinqiu Capital, MindWorks, Baidu Ventures, and Concept Capital.^[14]

What problem does MetaGPT solve?

The MetaGPT paper opens with an explicit diagnosis of where prior LLM agent stacks failed. The abstract states the core difficulty plainly: solutions to complex tasks are "complicated through logic inconsistencies due to cascading hallucinations caused by naively chaining LLMs."^[1] Naively chaining LLM calls (each agent only sees the previous agent's free-form text output) produces cascading hallucinations: a small fabrication in step k propagates into step k+1, where the next agent treats it as fact, and the error compounds.^[1]^[4] Existing chat-style frameworks such as the early versions of Auto-GPT, BabyAGI, AgentVerse, and conversational systems built on AutoGen passed natural-language messages but did not constrain their structure, which meant ambiguity in the messages led to logic inconsistencies in code, file structure, or API design.^[1]^[11]

The team's insight was that real software organizations solve the same problem with Standard Operating Procedures: a Product Requirements Document has a fixed schema, an architectural design produces specific named artifacts (file lists, API definitions, sequence diagrams), and code review follows a checklist. Encoding those SOPs into prompt sequences and demanding that each role produce structured output that conforms to a schema gives the next agent in the chain less room to misinterpret, and gives reviewers (human or another LLM agent) a stable surface to check.^[1]^[11] The paper formalizes this as a meta-programming approach: the framework programs the team that programs the code, with the SOP itself being the meta-program.^[11]

How does MetaGPT work?

The SOP-driven assembly line

MetaGPT models a software project as an assembly line of roles, each consuming the structured artifact produced by the previous role and emitting a new structured artifact for the next role.^[1]^[15] As the paper puts it, MetaGPT "utilizes an assembly line paradigm to assign diverse roles to various agents," breaking a complex task into subtasks that many agents handle together.^[1] The canonical pipeline maps a one-line user requirement through five role-specialized agents:

Product Manager receives the user requirement and produces a Product Requirements Document (PRD) that includes user stories, competitive analysis, and a requirements pool. The output schema is fixed so the next role can parse it deterministically.^[15]
Architect consumes the PRD and emits a system design document containing the file list, data structures, interface definitions, and a sequence diagram describing call flow between modules.^[15]
Project Manager breaks the system design into a task list, assigns each task to engineering agents, and tracks dependencies.^[15]
Engineer implements the assigned classes and functions, with access to the file list and interface definitions so that names and signatures stay consistent across files.^[15]
QA Engineer writes test cases against the produced code, executes them, and triggers code review or debugging cycles when failures occur. The paper highlights an executive feedback loop in which the QA agent can re-trigger the engineer with concrete error traces.^[11]^[15]

The roles are not chosen arbitrarily: they map closely to titles in human software organizations, and the prompts that define them include role description, goals, constraints, and a list of actions that the role is allowed to take. The framework also allows custom roles to be defined by subclassing a Role class with its own action set.^[5]

Communication via the shared message pool

Rather than have agents pass messages point-to-point (which would scale quadratically with the number of roles and create coupling between agent prompts), MetaGPT uses a shared message pool with a publish/subscribe protocol.^[1]^[16] Every agent publishes its structured output to a central pool. Other agents declare a subscription profile describing which message types they care about (for example, the Engineer subscribes to messages of type "Task" produced by the Project Manager). When a relevant message arrives, the framework routes it to the subscribing agent's input queue.^[16]

Two properties follow from this design. First, the messages exchanged are structured artifacts (PRDs, designs, task lists, code, test reports) rather than chat dialogue, which keeps the surface area auditable.^[1]^[16] Second, an agent can transparently access historical messages from the pool, removing the need to query other agents directly and avoiding circular conversations that plague chat-only systems.^[1]^[16]

Executive feedback and debugging

A practical contribution of MetaGPT, particularly emphasized in the paper's ablation, is the executive feedback mechanism. When code produced by the Engineer fails to execute, the QA Engineer (or a code-review sub-action) captures the runtime trace and re-publishes it as a structured error message. The Engineer agent then receives the trace and an instruction to fix the offending file, iterating until tests pass or a budget is exhausted.^[11] The paper attributes a 5.4 percentage point absolute improvement on the MBPP benchmark to this loop alone.^[11]

Token economics

The paper publishes detailed token-cost tables. On its SoftwareDev benchmark, a full MetaGPT run uses approximately 31,255 tokens per project compared with ChatDev's 19,292; in absolute terms MetaGPT is more expensive per project.^[11] However, MetaGPT generates substantially more code per project (an average of 251.4 lines against ChatDev's 77.5), so the per-line cost is lower: roughly 124.3 tokens per generated code line versus ChatDev's 248.9 tokens per line.^[11] Independent reproductions have reported overall per-task dollar costs frequently above ten USD on HumanEval when running with GPT-4, with token-duplication rates above 70% in some runs because the same artifacts are quoted into multiple agent prompts.^[17]

What agent roles does MetaGPT use?

The following table summarizes the canonical roles defined in the original paper and the production framework.

Role	Inputs subscribed to	Primary output	Key responsibility
Product Manager	User requirement	Product Requirements Document, user stories, requirements pool	Translate one-line ask into structured requirements^[15]
Architect	PRD	System design, file list, data structures, interfaces, sequence diagrams	Decide module decomposition and signatures^[15]
Project Manager	System design	Task list, dependency graph	Sequence and assign engineering tasks^[15]
Engineer	Task list, file list, interfaces	Implementation code	Implement classes and functions consistent with the design^[15]
QA Engineer	Code, task list	Test cases, test results, error traces	Validate code; trigger debug loop on failure^[11]^[15]

The framework supports custom roles, including a Researcher role used in later work on AFlow, and the Data Interpreter role used in the eponymous follow-up agent.^[5]^[12]

How well does MetaGPT perform on benchmarks?

HumanEval and MBPP

On standard code-generation benchmarks, the paper reports that MetaGPT with GPT-4 reaches a Pass@1 of 85.9% on HumanEval and 87.7% on MBPP, state-of-the-art numbers among multi-agent systems at the time of submission.^[11] The improvements over GPT-4 alone are not exclusively attributable to the multi-agent structure: the ablation study credits a meaningful portion to the executive feedback debug loop.^[11]

SoftwareDev

The paper introduces a custom benchmark called SoftwareDev, which evaluates whether a framework can produce a complete, executable software project from a single sentence. Projects are scored on a one-to-four executability scale, with four indicating a flawless run. The reported results are:^[11]

System	Executability (1-4)	Runtime (s)	Code lines	Human revision cost	Tokens / code line
MetaGPT	3.75	541	251.4	0.83 corrections	124.3
ChatDev	2.25	762	77.5	2.5 corrections	248.9
Auto-GPT	1.00	not applicable	not applicable	not applicable	not applicable
LangChain agent	1.00	not applicable	not applicable	not applicable	not applicable
AgentVerse	1.00	not applicable	not applicable	not applicable	not applicable

The paper interprets the 1.00 scores for Auto-GPT, LangChain agents, and AgentVerse as evidence that those frameworks generally fail to produce code that runs at all on this benchmark, whereas the SOP-driven approach of MetaGPT and (to a lesser extent) ChatDev does.^[11]

Data Interpreter benchmarks

The Data Interpreter follow-up paper (arXiv:2402.18679) reports separate numbers on data-science workflows:^[12]

Benchmark	Reported result
InfiAgent-DABench	accuracy up 25%, from 75.9% to 94.9%
Machine-learning tasks	accuracy from 88% to 95%
Open-ended tasks	accuracy from 60% to 97%
MATH dataset	26% improvement over the strongest prior baseline

These results are not directly comparable to HumanEval/MBPP, but they illustrate how the SOP-driven approach extends beyond software engineering into data science when paired with hierarchical graph planning.^[12]

AFlow benchmarks

The AFlow paper, building on MetaGPT, reports a 5.7% average improvement over state-of-the-art baselines across six benchmarks (HumanEval, MBPP, GSM8K, MATH, HotpotQA, DROP), and shows that smaller models orchestrated by AFlow can match or exceed GPT-4o on specific tasks at roughly 4.55% of GPT-4o's inference cost in dollars.^[13] AFlow was accepted as an oral presentation at ICLR 2025, listed among the top 1.8% of submissions and ranked second in the LLM-based agent category.^[10]^[13]

What products and variants build on MetaGPT?

Reference Python implementation

The reference implementation is the Python package metagpt, installable via pip install --upgrade metagpt or directly from the FoundationAgents/MetaGPT GitHub repository.^[5] Python accounts for roughly 97.5% of the codebase, with shell and other auxiliary languages making up the remainder.^[5] The library exposes a Team abstraction that bundles a set of Role objects with a shared message pool and a run loop. Roles are defined as classes that declare a profile, a list of Action objects, and a _act coroutine. Backends are configured through a YAML file that supports OpenAI, Azure OpenAI, Ollama, Groq, and other endpoints by adjusting api_type and base_url.^[5]

Data Interpreter

Data Interpreter is the most prominent role-level extension. It implements three pivotal techniques described in the 2024 paper: dynamic planning with hierarchical graph structures, dynamic tool integration during execution, and logical-inconsistency identification through feedback combined with experience recording.^[12] The Data Interpreter agent is part of the same repository under examples/di/ and is documented as a flagship demo of the framework's flexibility outside of pure software engineering.^[5]

MGX and Atoms

MGX (MetaGPT X) was launched by DeepWisdom on 19 February 2025 as a hosted commercial product that wraps a refined MetaGPT-derived multi-agent team for natural-language application development. The hosted team is named with anthropomorphic roles: Mike (team leader), Emma (product manager), Bob (architect), Alex (engineer), and David (data analyst).^[7] Within a month of launch, and without paid marketing, MGX reported roughly 500,000 registered users and about USD 1 million in annual recurring revenue, and it reached #1 Product of the Week on Product Hunt on 10 March 2025 after topping Product of the Day on 4 March 2025.^[10]^[14] In January 2026 the product was rebranded as Atoms, with DeepWisdom citing pronounceability (the name "MGX" was awkward to say in English) and consumer adoption as reasons for the change; the previous mgx.dev URL now redirects to atoms.dev.^[8]^[14] Atoms adds commercial infrastructure including a visual editor, managed authentication and databases, payments integration, an SEO specialist agent, and a Race Mode that runs multiple model backends in parallel and selects the best output.^[18] Reporting by KrAsia in early 2026 placed Atoms inside the broader Chinese vibe coding wave and noted that Atoms incorporates open-weight models such as DeepSeek and Qwen alongside closed APIs.^[14]

Forks and derivatives

Several community forks exist, including gofullthrottle/MetaGPT-Data-Interpreter and mskj-apaas/MetaGPT-2025, which track the upstream repository with localized changes.^[5] Documentation lives in the separate geekan/MetaGPT-docs repository, which hosts the official tutorial site for the framework.^[19]

What is MetaGPT used for?

Documented uses of MetaGPT cluster into four categories. First, end-to-end software prototyping, where the framework is given a single-sentence brief such as "write a snake game" or "build a CLI to-do tracker" and produces a runnable project with tests; this is the use case showcased in the original paper and in the SoftwareDev benchmark.^[11] Second, requirements-document automation, where teams use only the Product Manager role to draft PRDs and competitive analyses from informal notes, often as a starting point for human refinement.^[15] Third, data-science pipelines, via Data Interpreter, including exploratory analysis, machine-learning model training, and visualization.^[12] Fourth, meta-workflow generation through AFlow, in which MetaGPT itself is used as the substrate for searching better agent topologies for downstream tasks like MBPP, HumanEval, and MATH.^[13]

The commercial Atoms product targets a fifth use case: turning natural-language requirements into deployed, monetizable web applications complete with login, database, and payment integration. By September 2025, DeepWisdom reported that MGX/Atoms was processing approximately 1.2 million monthly visits and roughly 10,000 application launches per day.^[14]

Why is MetaGPT significant?

MetaGPT crystallized a design pattern that became influential across the agentic workflow ecosystem. Three contributions stand out.

First, the explicit framing of SOPs as prompt sequences gave researchers a vocabulary for what role-specialized multi-agent systems are doing: encoding organizational knowledge into the structure of communication. This framing has been picked up by subsequent multi-agent research, including Magentic-One and other orchestration frameworks that explicitly cite MetaGPT as motivation.^[20]

Second, the shared message pool with publish/subscribe semantics offered an alternative to chat-style coordination that has informed the design of newer agentic systems and protocols.^[16] By making structured artifacts the unit of communication, MetaGPT made it easier to log, inspect, and replay agent decisions, which is crucial for agent evaluation.

Third, the SoftwareDev benchmark, while small, normalized the practice of evaluating multi-agent code-generation systems on whether they can produce projects that run, rather than just snippets that pass unit tests. Subsequent work, including evaluations of multi-agent frameworks on SWE-bench, has built on this idea of end-to-end executability.^[21]

The project's ICLR 2024 Oral status and the follow-up ICLR 2025 Oral acceptance of AFlow indicate that the academic community has consistently regarded the MetaGPT line of work as a serious research contribution rather than an engineering demo, despite some skepticism (discussed below) about the cost and reproducibility of multi-agent pipelines.^[3]^[13]

What are MetaGPT's limitations and criticisms?

Token cost and overhead

Several independent reviews note that MetaGPT's quality comes at substantial cost. Multi-agent communication overhead has been measured at over USD 10 per task on HumanEval in published reproductions, driven primarily by serial messages that re-quote the same context across agents.^[17] One published review reported token-duplication rates of roughly 72% for MetaGPT, meaning a large fraction of tokens billed by the LLM provider are repeated context rather than new content.^[17] The paper itself acknowledges higher absolute token usage than ChatDev, while arguing that per-line cost is lower because more code is produced.^[11]

Reference and version inconsistencies

MetaGPT-generated code occasionally references non-existent resource files (images, audio) or invokes undefined classes when synthesizing complex projects, because the LLM lacks live access to the project's actual file system or package indices.^[17] LLMs without browsing also tend to produce code against outdated package versions; initial tests frequently fail due to version-compatibility issues, and the framework relies on its executive feedback loop to recover.^[17]^[11]

Debugging across agents

When something breaks inside a multi-agent pipeline, tracing the cause across multiple agent prompts and intermediate documents is harder than debugging a single-agent chain. Reviews aimed at practitioners describe this as a recurring complaint, particularly when QA tests pass on the happy path but break on edge cases.^[17] Galileo's analysis of multi-agent coordination identified several failure modes (cascading miscommunication, role drift, infinite loops in feedback cycles) that affect MetaGPT alongside other frameworks.^[17]

Open-loop vs closed-loop critique

A broader critique applies to the entire genre of role-based multi-agent frameworks (including MetaGPT, ChatDev, AutoGen, and CrewAI): assigning roles and hierarchies up front may simply re-encode human organizational structure into a place where it is not necessarily optimal for LLMs. Some recent empirical work argues that the assumption that role-specialization improves outcomes does not always hold when the cost of communication is taken into account.^[22] The MetaGPT authors' own AFlow follow-up partially addresses this by searching for the workflow rather than fixing it in advance.^[13]

How does MetaGPT compare to other multi-agent frameworks?

The table below summarizes the practical differences among the most-cited LLM multi-agent frameworks at the time of writing.

Framework	Origin	Communication style	Specialization	License
MetaGPT^[11]	DeepWisdom, 2023	Shared message pool, pub/sub of structured artifacts	Software dev SOP roles (PM, Architect, PgM, Eng, QA)	MIT^[5]
ChatDev^[23]	Tsinghua / OpenBMB, 2023	Chat-chain along a phase-based pipeline; "communicative dehallucination"	Software phases (design, code, test)	Apache 2.0
AutoGen^[24]	Microsoft Research, 2023	Conversable agents, free-form multi-agent chat	General; agents customized per app	CC-BY-4.0 / MIT
CrewAI^[25]	crewAIInc, 2024	Role + backstory + goal; sequential or hierarchical task flows	General teams with custom roles	MIT
Magentic-One^[20]	Microsoft Research, 2024	Orchestrator with planner and ledger; sub-agents for browsing, coding, file I/O	General computer-use tasks	MIT

The signature difference between MetaGPT and the others is the insistence on structured artifacts (PRD, design, tasklist, code) rather than free-form natural-language chat as the unit of inter-agent communication. ChatDev is the closest peer because it also encodes software-development phases, but it does so via a conversation chain rather than a publish/subscribe message pool, and the original paper reports lower executability and higher revision costs on SoftwareDev.^[11]^[23] AutoGen and CrewAI are more general-purpose; they leave it to the developer to define which artifacts are passed between agents and impose less structure on the messages themselves.^[24]^[25]

The MetaGPT bibliography draws on three streams of prior work. The first is autonomous-agent demos such as Auto-GPT and BabyAGI that demonstrated chain-of-thought driven autonomy but suffered from drift; MetaGPT explicitly positions itself as a structured alternative.^[1] The second is single-agent reasoning techniques such as Reflexion and ReAct that use self-reflection or interleaved reasoning and action to improve single-agent performance; MetaGPT's executive-feedback loop is conceptually related but operates across role-specialized agents rather than within one agent.^[4] The third is multi-agent communication research, especially the contemporary AutoGen paper and ChatDev, which together with MetaGPT defined the 2023-2024 multi-agent landscape.^[23]^[24]

References

Sirui Hong et al., "MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (arXiv abstract)", arXiv, 2023-08-01. https://arxiv.org/abs/2308.00352. Accessed 2026-07-12. ↩
arXiv, "[2308.00352] MetaGPT version history", arXiv, 2024-11-01. https://arxiv.org/abs/2308.00352v7. Accessed 2026-05-20. ↩
OpenReview, "MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (ICLR 2024 Oral)", OpenReview, 2024-01-16. https://openreview.net/forum?id=VtmBAGCN7o. Accessed 2026-05-20. ↩
Sirui Hong et al., "MetaGPT (HTML version v7), Introduction and Motivation", arXiv, 2024-11-01. https://arxiv.org/html/2308.00352v7. Accessed 2026-07-12. ↩
FoundationAgents, "MetaGPT repository README", GitHub, 2026-07-12. https://github.com/FoundationAgents/MetaGPT. Accessed 2026-07-12. ↩
MetaGPT (@MetaGPT_), "Congratulations to MetaGPT for reaching 59.4k stars on GitHub", X (Twitter), 2025-11-13. https://x.com/MetaGPT_/status/1988942176559251879. Accessed 2026-05-20. ↩
Press Advantage, "MetaGPT Team Launches MetaGPT X, World's First AI Multi-Agent Team Platform", Press Advantage, 2025-02-19. https://pressadvantage.com/story/77512-metagpt-team-launches-metagpt-x-world-s-first-ai-multi-agent-team-platform. Accessed 2026-05-20. ↩
MetaGPT (@MetaGPT_), "MGX is now Atoms (rebranding announcement)", X (Twitter), 2026-01-13. https://x.com/MetaGPT_/status/2011063325413810415. Accessed 2026-05-20. ↩
OpenReview, "Sirui Hong profile (Researcher, DeepWisdom)", OpenReview, 2025-01-01. https://openreview.net/profile?id=~Sirui_Hong1. Accessed 2026-05-20. ↩
FoundationAgents, "MetaGPT README, Recognition section (ICLR orals, Product Hunt, Open100, GitHub Trending)", GitHub, 2026-07-12. https://github.com/FoundationAgents/MetaGPT#recognition. Accessed 2026-07-12. ↩
Sirui Hong et al., "MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (PDF)", arXiv / ICLR 2024, 2024-11-01. https://arxiv.org/pdf/2308.00352. Accessed 2026-07-12. ↩
Sirui Hong et al., "Data Interpreter: An LLM Agent For Data Science", arXiv:2402.18679, 2024-02-28. https://arxiv.org/abs/2402.18679. Accessed 2026-07-12. ↩
Jinyu Xiang et al., "AFlow: Automating Agentic Workflow Generation", arXiv:2410.10762 (ICLR 2025 Oral), 2024-10-14. https://arxiv.org/abs/2410.10762. Accessed 2026-07-12. ↩
KrAsia, "From MetaGPT to Atoms: DeepWisdom leads China's push into vibe coding", KrAsia, 2026-02-05. https://kr-asia.com/from-metagpt-to-atoms-deepwisdom-leads-chinas-push-into-vibe-coding. Accessed 2026-07-12. ↩
IBM, "What is MetaGPT?", IBM Think Topics, 2024-12-01. https://www.ibm.com/think/topics/metagpt. Accessed 2026-05-20. ↩
Sirui Hong et al., "MetaGPT (HTML v6), Section 3.2 Shared Message Pool", arXiv, 2024-08-01. https://arxiv.org/html/2308.00352v6. Accessed 2026-05-20. ↩
Galileo, "Multi-Agent Coordination Gone Wrong? Fix With 10 Strategies", Galileo Blog, 2025-09-01. https://galileo.ai/blog/multi-agent-coordination-strategies. Accessed 2026-05-20. ↩
Atoms (DeepWisdom), "Atoms product overview", atoms.dev, 2026-01-13. https://atoms.dev/. Accessed 2026-05-20. ↩
geekan, "MetaGPT-docs documentation repository", GitHub, 2025-10-01. https://github.com/geekan/MetaGPT-docs. Accessed 2026-05-20. ↩
Adam Fourney et al., "Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks", Microsoft Research, 2024-11-04. https://www.microsoft.com/en-us/research/publication/magentic-one-a-generalist-multi-agent-system-for-solving-complex-tasks/. Accessed 2026-05-20. ↩
Carlos Jimenez et al., "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?", arXiv:2310.06770, 2023-10-10. https://arxiv.org/abs/2310.06770. Accessed 2026-05-20. ↩
Sukh Sroay, "Multi-agent framework empirical analysis (25,000-task experiment)", X (Twitter), 2025-11-01. https://x.com/sukh_saroy/status/2039381283999293799. Accessed 2026-05-20. ↩
Chen Qian et al., "ChatDev: Communicative Agents for Software Development", arXiv:2307.07924 (ACL 2024), 2023-07-16. https://arxiv.org/abs/2307.07924. Accessed 2026-05-20. ↩
Qingyun Wu et al., "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation", arXiv:2308.08155, 2023-10-03. https://arxiv.org/abs/2308.08155. Accessed 2026-05-20. ↩
crewAIInc, "CrewAI repository and documentation", GitHub, 2025-12-01. https://github.com/crewAIInc/crewAI. Accessed 2026-05-20. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributor · full history

Suggest edit

What links here

Auto-GPT ChatDev Multi-agent system

Infobox

History

Origins at DeepWisdom

Paper publication and revisions

Subsequent work

What problem does MetaGPT solve?

How does MetaGPT work?

The SOP-driven assembly line

Communication via the shared message pool

Executive feedback and debugging

Token economics

What agent roles does MetaGPT use?

How well does MetaGPT perform on benchmarks?

HumanEval and MBPP

SoftwareDev

Data Interpreter benchmarks

AFlow benchmarks

What products and variants build on MetaGPT?

Reference Python implementation

Data Interpreter

MGX and Atoms

Forks and derivatives

What is MetaGPT used for?

Why is MetaGPT significant?

What are MetaGPT's limitations and criticisms?

Token cost and overhead

Reference and version inconsistencies

Debugging across agents

Open-loop vs closed-loop critique

How does MetaGPT compare to other multi-agent frameworks?

Related work

See also

References

Improve this article

Related Articles

Cline (AI coding agent)

Roo Code

Gemini CLI

OpenHands

opencode (SST)

SWE-agent

What links here

Related Articles

Cline (AI coding agent)

Roo Code

Gemini CLI

OpenHands

opencode (SST)

SWE-agent

What links here