# MetaGPT

> Source: https://aiwiki.ai/wiki/metagpt
> Updated: 2026-07-11
> Categories: AI Agents, AI Code Generation, Open Source AI
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**MetaGPT** is an open-source multi-agent framework that organizes [large language model](/wiki/large_language_model) agents into a simulated software-development company, with role-specialized agents (Product Manager, Architect, Project Manager, Engineer, and QA Engineer) collaborating through codified Standardized Operating Procedures (SOPs) to turn a single-line natural-language requirement into a complete software artifact set, including a product requirements document (PRD), system design, task list, source code, and tests.[1] The framework was introduced in the paper *MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework* by Sirui Hong, Mingchen Zhuge, and colleagues at DeepWisdom and collaborating institutions, first posted to arXiv on 1 August 2023 (2308.00352) and later accepted as an oral presentation (the top 1.2% of submissions, ranked first in the LLM-based agent category) at the [International Conference on Learning Representations](/wiki/iclr) 2024.[2][3][10] MetaGPT's central thesis is that "Code = SOP(Team)": by encoding the standardized operating procedures of human software organizations into prompt sequences and structured artifacts, cascading hallucinations and miscoordination among chained [LLM](/wiki/llm) agents can be substantially reduced.[1][4] The reference implementation is released on GitHub under the [MIT License](/wiki/mit_license) and, having grown from about 59,400 stars in November 2025 to more than 69,000 by mid-2026, ranks among the most popular [multi-agent](/wiki/multi_agent_system) frameworks for software automation alongside [AutoGen](/wiki/autogen) and [CrewAI](/wiki/crewai).[5][6] The repository describes itself as "The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming."[5] DeepWisdom subsequently commercialized the framework as MGX, launched February 2025 and rebranded Atoms in January 2026.[7][8]

## Infobox

| Attribute | Value |
| --- | --- |
| Original name | MetaGPT |
| Type | Open-source LLM multi-agent framework |
| Domain | Automated software development, data science |
| Creator | DeepWisdom (Sirui Hong et al.) |
| First arXiv preprint | 1 August 2023 (arXiv:2308.00352)[2] |
| ICLR 2024 status | Oral presentation, top 1.2% of submissions, ranked #1 in the LLM-based agent category[3][10] |
| Repository | github.com/FoundationAgents/MetaGPT (formerly geekan/MetaGPT)[5] |
| License | MIT[5] |
| Primary language | Python (approximately 97.5% of codebase)[5] |
| Latest tagged release | v0.8.1 (22 April 2024)[5] |
| GitHub stars | more than 69,000 (July 2026); about 59,400 in November 2025[5][6] |
| Commercial product | MGX (19 February 2025), rebranded Atoms (13 January 2026)[7][8] |
| Reported funding | approximately RMB 220 million (about USD 30.8 million); Series A and Series A+ led by Cathay Capital[14] |
| Core philosophy | "Code = SOP(Team)"[5] |

## History

### Origins at DeepWisdom

MetaGPT originated in the research group led by Sirui Hong at DeepWisdom (also referred to in some Chinese sources as Fuzhi), a Shenzhen-based startup focused on autonomous agents.[9] Hong is listed as co-founder of DeepWisdom and as a researcher who leads the MetaGPT team, with related work on agent platforms such as AgentStore.[9] The project was bootstrapped in mid-2023 in response to a wave of single-agent autonomous systems including [Auto-GPT](/wiki/autogpt) and [BabyAGI](/wiki/babyagi) that, while compelling demos, exhibited brittle behavior on multi-step software tasks and frequently produced logically inconsistent output because each step had no enforced contract with its successor.[4]

The initial public release of the repository under the GitHub handle `geekan/MetaGPT` occurred at the end of August 2023, immediately following the first arXiv submission of the paper on 1 August 2023.[2][5] Within weeks the project trended on GitHub's monthly charts and within months it was included in the Open100 list of top open-source achievements for 2023.[10]

### Paper publication and revisions

The paper went through several revisions on arXiv between August 2023 and November 2024, with the seventh version (v7) representing the camera-ready ICLR 2024 manuscript.[2] OpenReview records confirm acceptance as an oral presentation at ICLR 2024, with publication dated 16 January 2024 under submission number 5488.[3] The author list expanded between preprint versions but the camera-ready paper credits fifteen authors: Sirui Hong, Mingchen Zhuge, Jiaqi Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and [Jürgen Schmidhuber](/wiki/jurgen_schmidhuber), with collaborating affiliations including The Chinese University of Hong Kong (Shenzhen) and KAUST (where Schmidhuber holds an appointment).[11]

### Subsequent work

After the foundational paper, the MetaGPT team published a series of extensions:

- **Data Interpreter** (arXiv:2402.18679, first posted 28 February 2024) introduced a specialist agent focused on data-science workflows, using hierarchical graph modeling and programmable node generation for dynamic planning.[12]
- **AFlow: Automating Agentic Workflow Generation** (arXiv:2410.10762) reformulated agentic workflow design as a [Monte Carlo Tree Search](/wiki/monte_carlo_tree_search) problem over code-represented workflows; it was accepted as an oral presentation at ICLR 2025.[13]
- **MGX (MetaGPT X)**, a hosted commercial product built on the framework, launched on 19 February 2025 and was rebranded as Atoms in January 2026.[7][8]
- The GitHub organization was migrated from the personal handle `geekan` to the dedicated organization `FoundationAgents` in 2024 while keeping the original URL as a redirect.[5]

By November 2025, the MetaGPT team reported on social media that the repository had crossed 59,400 stars, and the count climbed past 69,000 by mid-2026.[5][6] Reporting by Korean and Chinese tech press described the project's commercial arm as having raised approximately RMB 220 million (about USD 30.8 million) across two 2025 rounds, a Series A and a Series A+ led by Cathay Capital and backed by Ant Group, Jinqiu Capital, MindWorks, Baidu Ventures, and Concept Capital.[14]

## What problem does MetaGPT solve?

The MetaGPT paper opens with an explicit diagnosis of where prior LLM agent stacks failed. The abstract states the core difficulty plainly: solutions to complex tasks are "complicated through logic inconsistencies due to cascading hallucinations caused by naively chaining LLMs."[1] Naively chaining LLM calls (each agent only sees the previous agent's free-form text output) produces *cascading hallucinations*: a small fabrication in step k propagates into step k+1, where the next agent treats it as fact, and the error compounds.[1][4] Existing chat-style frameworks such as the early versions of [Auto-GPT](/wiki/autogpt), [BabyAGI](/wiki/babyagi), AgentVerse, and conversational systems built on [AutoGen](/wiki/autogen) passed natural-language messages but did not constrain their structure, which meant ambiguity in the messages led to logic inconsistencies in code, file structure, or API design.[1][11]

The team's insight was that real software organizations solve the same problem with *Standard Operating Procedures*: a Product Requirements Document has a fixed schema, an architectural design produces specific named artifacts (file lists, API definitions, sequence diagrams), and code review follows a checklist. Encoding those SOPs into prompt sequences and demanding that each role produce structured output that conforms to a schema gives the next agent in the chain less room to misinterpret, and gives reviewers (human or another LLM agent) a stable surface to check.[1][11] The paper formalizes this as a *meta-programming* approach: the framework programs the team that programs the code, with the SOP itself being the meta-program.[11]

## How does MetaGPT work?

### The SOP-driven assembly line

MetaGPT models a software project as an assembly line of roles, each consuming the structured artifact produced by the previous role and emitting a new structured artifact for the next role.[1][15] As the paper puts it, MetaGPT "utilizes an assembly line paradigm to assign diverse roles to various agents," breaking a complex task into subtasks that many agents handle together.[1] The canonical pipeline maps a one-line user requirement through five role-specialized agents:

1. **Product Manager** receives the user requirement and produces a Product Requirements Document (PRD) that includes user stories, competitive analysis, and a requirements pool. The output schema is fixed so the next role can parse it deterministically.[15]
2. **Architect** consumes the PRD and emits a system design document containing the file list, data structures, interface definitions, and a sequence diagram describing call flow between modules.[15]
3. **Project Manager** breaks the system design into a task list, assigns each task to engineering agents, and tracks dependencies.[15]
4. **Engineer** implements the assigned classes and functions, with access to the file list and interface definitions so that names and signatures stay consistent across files.[15]
5. **QA Engineer** writes test cases against the produced code, executes them, and triggers code review or debugging cycles when failures occur. The paper highlights an *executive feedback* loop in which the QA agent can re-trigger the engineer with concrete error traces.[11][15]

The roles are not chosen arbitrarily: they map closely to titles in human software organizations, and the prompts that define them include role description, goals, constraints, and a list of actions that the role is allowed to take. The framework also allows custom roles to be defined by subclassing a `Role` class with its own action set.[5]

### Communication via the shared message pool

Rather than have agents pass messages point-to-point (which would scale quadratically with the number of roles and create coupling between agent prompts), MetaGPT uses a *shared message pool* with a publish/subscribe protocol.[1][16] Every agent publishes its structured output to a central pool. Other agents declare a *subscription profile* describing which message types they care about (for example, the Engineer subscribes to messages of type "Task" produced by the Project Manager). When a relevant message arrives, the framework routes it to the subscribing agent's input queue.[16]

Two properties follow from this design. First, the messages exchanged are structured artifacts (PRDs, designs, task lists, code, test reports) rather than chat dialogue, which keeps the surface area auditable.[1][16] Second, an agent can transparently access historical messages from the pool, removing the need to query other agents directly and avoiding circular conversations that plague chat-only systems.[1][16]

### Executive feedback and debugging

A practical contribution of MetaGPT, particularly emphasized in the paper's ablation, is the *executive feedback* mechanism. When code produced by the Engineer fails to execute, the QA Engineer (or a code-review sub-action) captures the runtime trace and re-publishes it as a structured error message. The Engineer agent then receives the trace and an instruction to fix the offending file, iterating until tests pass or a budget is exhausted.[11] The paper attributes a 5.4 percentage point absolute improvement on the [MBPP](/wiki/mbpp) benchmark to this loop alone.[11]

### Token economics

The paper publishes detailed token-cost tables. On its SoftwareDev benchmark, a full MetaGPT run uses approximately 31,255 tokens per project compared with ChatDev's 19,292; in absolute terms MetaGPT is more expensive per project.[11] However, MetaGPT generates substantially more code per project (an average of 251.4 lines against ChatDev's 77.5), so the *per-line* cost is lower: roughly 124.3 tokens per generated code line versus ChatDev's 248.9 tokens per line.[11] Independent reproductions have reported overall per-task dollar costs frequently above ten USD on HumanEval when running with [GPT-4](/wiki/gpt-4), with token-duplication rates above 70% in some runs because the same artifacts are quoted into multiple agent prompts.[17]

## What agent roles does MetaGPT use?

The following table summarizes the canonical roles defined in the original paper and the production framework.

| Role | Inputs subscribed to | Primary output | Key responsibility |
| --- | --- | --- | --- |
| Product Manager | User requirement | Product Requirements Document, user stories, requirements pool | Translate one-line ask into structured requirements[15] |
| Architect | PRD | System design, file list, data structures, interfaces, sequence diagrams | Decide module decomposition and signatures[15] |
| Project Manager | System design | Task list, dependency graph | Sequence and assign engineering tasks[15] |
| Engineer | Task list, file list, interfaces | Implementation code | Implement classes and functions consistent with the design[15] |
| QA Engineer | Code, task list | Test cases, test results, error traces | Validate code; trigger debug loop on failure[11][15] |

The framework supports custom roles, including a Researcher role used in later work on AFlow, and the Data Interpreter role used in the eponymous follow-up agent.[5][12]

## How well does MetaGPT perform on benchmarks?

### HumanEval and MBPP

On standard code-generation benchmarks, the paper reports that MetaGPT with [GPT-4](/wiki/gpt-4) reaches a Pass@1 of 85.9% on [HumanEval](/wiki/humaneval) and 87.7% on [MBPP](/wiki/mbpp), state-of-the-art numbers among multi-agent systems at the time of submission.[11] The improvements over GPT-4 alone are not exclusively attributable to the multi-agent structure: the ablation study credits a meaningful portion to the executive feedback debug loop.[11]

### SoftwareDev

The paper introduces a custom benchmark called *SoftwareDev*, which evaluates whether a framework can produce a complete, executable software project from a single sentence. Projects are scored on a one-to-four executability scale, with four indicating a flawless run. The reported results are:[11]

| System | Executability (1-4) | Runtime (s) | Code lines | Human revision cost | Tokens / code line |
| --- | --- | --- | --- | --- | --- |
| MetaGPT | 3.75 | 541 | 251.4 | 0.83 corrections | 124.3 |
| ChatDev | 2.25 | 762 | 77.5 | 2.5 corrections | 248.9 |
| Auto-GPT | 1.00 | not applicable | not applicable | not applicable | not applicable |
| LangChain agent | 1.00 | not applicable | not applicable | not applicable | not applicable |
| AgentVerse | 1.00 | not applicable | not applicable | not applicable | not applicable |

The paper interprets the 1.00 scores for [Auto-GPT](/wiki/autogpt), [LangChain](/wiki/langchain) agents, and AgentVerse as evidence that those frameworks generally fail to produce code that runs at all on this benchmark, whereas the SOP-driven approach of MetaGPT and (to a lesser extent) ChatDev does.[11]

### Data Interpreter benchmarks

The Data Interpreter follow-up paper (arXiv:2402.18679) reports separate numbers on data-science workflows:[12]

| Benchmark | Reported result |
| --- | --- |
| InfiAgent-DABench | accuracy up 25%, from 75.9% to 94.9% |
| Machine-learning tasks | accuracy from 88% to 95% |
| Open-ended tasks | accuracy from 60% to 97% |
| MATH dataset | 26% improvement over the strongest prior baseline |

These results are not directly comparable to HumanEval/MBPP, but they illustrate how the SOP-driven approach extends beyond software engineering into [data science](/wiki/data_science) when paired with hierarchical graph planning.[12]

### AFlow benchmarks

The AFlow paper, building on MetaGPT, reports a 5.7% average improvement over state-of-the-art baselines across six benchmarks (HumanEval, MBPP, GSM8K, MATH, HotpotQA, DROP), and shows that smaller models orchestrated by AFlow can match or exceed [GPT-4o](/wiki/gpt_4o) on specific tasks at roughly 4.55% of GPT-4o's inference cost in dollars.[13] AFlow was accepted as an oral presentation at ICLR 2025, listed among the top 1.8% of submissions and ranked second in the LLM-based agent category.[10][13]

## What products and variants build on MetaGPT?

### Reference Python implementation

The reference implementation is the Python package `metagpt`, installable via `pip install --upgrade metagpt` or directly from the FoundationAgents/MetaGPT GitHub repository.[5] [Python](/wiki/python) accounts for roughly 97.5% of the codebase, with shell and other auxiliary languages making up the remainder.[5] The library exposes a `Team` abstraction that bundles a set of `Role` objects with a shared message pool and a run loop. Roles are defined as classes that declare a profile, a list of `Action` objects, and a `_act` coroutine. Backends are configured through a YAML file that supports OpenAI, [Azure OpenAI](/wiki/azure_openai), Ollama, Groq, and other endpoints by adjusting `api_type` and `base_url`.[5]

### Data Interpreter

Data Interpreter is the most prominent role-level extension. It implements three pivotal techniques described in the 2024 paper: dynamic planning with hierarchical graph structures, dynamic tool integration during execution, and logical-inconsistency identification through feedback combined with experience recording.[12] The Data Interpreter agent is part of the same repository under `examples/di/` and is documented as a flagship demo of the framework's flexibility outside of pure software engineering.[5]

### MGX and Atoms

MGX (MetaGPT X) was launched by DeepWisdom on 19 February 2025 as a hosted commercial product that wraps a refined MetaGPT-derived multi-agent team for natural-language application development. The hosted team is named with anthropomorphic roles: Mike (team leader), Emma (product manager), Bob (architect), Alex (engineer), and David (data analyst).[7] Within a month of launch, and without paid marketing, MGX reported roughly 500,000 registered users and about USD 1 million in annual recurring revenue, and it reached #1 Product of the Week on Product Hunt on 10 March 2025 after topping Product of the Day on 4 March 2025.[10][14] In January 2026 the product was rebranded as Atoms, with DeepWisdom citing pronounceability (the name "MGX" was awkward to say in English) and consumer adoption as reasons for the change; the previous `mgx.dev` URL now redirects to `atoms.dev`.[8][14] Atoms adds commercial infrastructure including a visual editor, managed authentication and databases, payments integration, an SEO specialist agent, and a Race Mode that runs multiple model backends in parallel and selects the best output.[18] Reporting by KrAsia in early 2026 placed Atoms inside the broader Chinese [vibe coding](/wiki/vibe_coding) wave and noted that Atoms incorporates open-weight models such as DeepSeek and Qwen alongside closed APIs.[14]

### Forks and derivatives

Several community forks exist, including `gofullthrottle/MetaGPT-Data-Interpreter` and `mskj-apaas/MetaGPT-2025`, which track the upstream repository with localized changes.[5] Documentation lives in the separate `geekan/MetaGPT-docs` repository, which hosts the official tutorial site for the framework.[19]

## What is MetaGPT used for?

Documented uses of MetaGPT cluster into four categories. First, *end-to-end software prototyping*, where the framework is given a single-sentence brief such as "write a snake game" or "build a CLI to-do tracker" and produces a runnable project with tests; this is the use case showcased in the original paper and in the SoftwareDev benchmark.[11] Second, *requirements-document automation*, where teams use only the Product Manager role to draft PRDs and competitive analyses from informal notes, often as a starting point for human refinement.[15] Third, *data-science pipelines*, via Data Interpreter, including exploratory analysis, machine-learning model training, and visualization.[12] Fourth, *meta-workflow generation* through AFlow, in which MetaGPT itself is used as the substrate for searching better agent topologies for downstream tasks like [MBPP](/wiki/mbpp), [HumanEval](/wiki/humaneval), and [MATH](/wiki/math).[13]

The commercial Atoms product targets a fifth use case: turning natural-language requirements into deployed, monetizable web applications complete with login, database, and payment integration. By September 2025, DeepWisdom reported that MGX/Atoms was processing approximately 1.2 million monthly visits and roughly 10,000 application launches per day.[14]

## Why is MetaGPT significant?

MetaGPT crystallized a design pattern that became influential across the [agentic workflow](/wiki/agentic_workflow) ecosystem. Three contributions stand out.

First, the explicit framing of SOPs as prompt sequences gave researchers a vocabulary for what role-specialized [multi-agent systems](/wiki/multi_agent_system) are doing: encoding organizational knowledge into the structure of communication. This framing has been picked up by subsequent multi-agent research, including [Magentic-One](/wiki/magentic_one) and other orchestration frameworks that explicitly cite MetaGPT as motivation.[20]

Second, the shared message pool with publish/subscribe semantics offered an alternative to chat-style coordination that has informed the design of newer agentic systems and protocols.[16] By making structured artifacts the unit of communication, MetaGPT made it easier to log, inspect, and replay agent decisions, which is crucial for [agent evaluation](/wiki/agent_evaluation).

Third, the SoftwareDev benchmark, while small, normalized the practice of evaluating multi-agent code-generation systems on whether they can produce projects that *run*, rather than just snippets that pass unit tests. Subsequent work, including evaluations of multi-agent frameworks on [SWE-bench](/wiki/swe_bench), has built on this idea of end-to-end executability.[21]

The project's ICLR 2024 Oral status and the follow-up ICLR 2025 Oral acceptance of AFlow indicate that the academic community has consistently regarded the MetaGPT line of work as a serious research contribution rather than an engineering demo, despite some skepticism (discussed below) about the cost and reproducibility of multi-agent pipelines.[3][13]

## What are MetaGPT's limitations and criticisms?

### Token cost and overhead

Several independent reviews note that MetaGPT's quality comes at substantial cost. Multi-agent communication overhead has been measured at over USD 10 per task on [HumanEval](/wiki/humaneval) in published reproductions, driven primarily by serial messages that re-quote the same context across agents.[17] One published review reported token-duplication rates of roughly 72% for MetaGPT, meaning a large fraction of tokens billed by the LLM provider are repeated context rather than new content.[17] The paper itself acknowledges higher absolute token usage than ChatDev, while arguing that per-line cost is lower because more code is produced.[11]

### Reference and version inconsistencies

MetaGPT-generated code occasionally references non-existent resource files (images, audio) or invokes undefined classes when synthesizing complex projects, because the LLM lacks live access to the project's actual file system or package indices.[17] LLMs without browsing also tend to produce code against outdated package versions; initial tests frequently fail due to version-compatibility issues, and the framework relies on its executive feedback loop to recover.[17][11]

### Debugging across agents

When something breaks inside a multi-agent pipeline, tracing the cause across multiple agent prompts and intermediate documents is harder than debugging a single-agent chain. Reviews aimed at practitioners describe this as a recurring complaint, particularly when QA tests pass on the happy path but break on edge cases.[17] Galileo's analysis of multi-agent coordination identified several failure modes (cascading miscommunication, role drift, infinite loops in feedback cycles) that affect MetaGPT alongside other frameworks.[17]

### Open-loop vs closed-loop critique

A broader critique applies to the entire genre of role-based multi-agent frameworks (including MetaGPT, ChatDev, [AutoGen](/wiki/autogen), and [CrewAI](/wiki/crewai)): assigning roles and hierarchies up front may simply re-encode human organizational structure into a place where it is not necessarily optimal for [LLMs](/wiki/llm). Some recent empirical work argues that the assumption that role-specialization improves outcomes does not always hold when the cost of communication is taken into account.[22] The MetaGPT authors' own AFlow follow-up partially addresses this by *searching* for the workflow rather than fixing it in advance.[13]

## How does MetaGPT compare to other multi-agent frameworks?

The table below summarizes the practical differences among the most-cited LLM multi-agent frameworks at the time of writing.

| Framework | Origin | Communication style | Specialization | License |
| --- | --- | --- | --- | --- |
| MetaGPT[11] | DeepWisdom, 2023 | Shared message pool, pub/sub of structured artifacts | Software dev SOP roles (PM, Architect, PgM, Eng, QA) | MIT[5] |
| ChatDev[23] | Tsinghua / OpenBMB, 2023 | Chat-chain along a phase-based pipeline; "communicative dehallucination" | Software phases (design, code, test) | Apache 2.0 |
| AutoGen[24] | Microsoft Research, 2023 | Conversable agents, free-form multi-agent chat | General; agents customized per app | CC-BY-4.0 / MIT |
| CrewAI[25] | crewAIInc, 2024 | Role + backstory + goal; sequential or hierarchical task flows | General teams with custom roles | MIT |
| Magentic-One[20] | Microsoft Research, 2024 | Orchestrator with planner and ledger; sub-agents for browsing, coding, file I/O | General computer-use tasks | MIT |

The signature difference between MetaGPT and the others is the insistence on *structured artifacts* (PRD, design, tasklist, code) rather than free-form natural-language chat as the unit of inter-agent communication. ChatDev is the closest peer because it also encodes software-development phases, but it does so via a conversation chain rather than a publish/subscribe message pool, and the original paper reports lower executability and higher revision costs on SoftwareDev.[11][23] [AutoGen](/wiki/autogen) and [CrewAI](/wiki/crewai) are more general-purpose; they leave it to the developer to define which artifacts are passed between agents and impose less structure on the messages themselves.[24][25]

## Related work

The MetaGPT bibliography draws on three streams of prior work. The first is autonomous-agent demos such as [Auto-GPT](/wiki/autogpt) and [BabyAGI](/wiki/babyagi) that demonstrated chain-of-thought driven autonomy but suffered from drift; MetaGPT explicitly positions itself as a structured alternative.[1] The second is single-agent reasoning techniques such as [Reflexion](/wiki/reflexion) and [ReAct](/wiki/react_prompting) that use self-reflection or interleaved reasoning and action to improve single-agent performance; MetaGPT's executive-feedback loop is conceptually related but operates across role-specialized agents rather than within one agent.[4] The third is multi-agent communication research, especially the contemporary [AutoGen](/wiki/autogen) paper and ChatDev, which together with MetaGPT defined the 2023-2024 multi-agent landscape.[23][24]

## See also

- [Multi-agent system](/wiki/multi_agent_system)
- [AutoGen](/wiki/autogen)
- [CrewAI](/wiki/crewai)
- [Auto-GPT](/wiki/autogpt)
- [BabyAGI](/wiki/babyagi)
- [Magentic-One](/wiki/magentic_one)
- [AI coding agent](/wiki/ai_coding_agent)
- [Agentic workflow](/wiki/agentic_workflow)
- [AI Code Generation](/wiki/ai_code_generation)
- [Devin (AI software engineer)](/wiki/devin)
- [HumanEval](/wiki/humaneval)
- [MBPP](/wiki/mbpp)
- [SWE-bench](/wiki/swe_bench)
- [Reflexion](/wiki/reflexion)
- [ReAct (prompting)](/wiki/react_prompting)
- [AgentBench](/wiki/agentbench)
- [Hallucination](/wiki/hallucination)
- [Agent orchestration](/wiki/agent_orchestration)
- [Agent evaluation](/wiki/agent_evaluation)

## References

1. Sirui Hong et al., "MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (arXiv abstract)", arXiv, 2023-08-01. https://arxiv.org/abs/2308.00352. Accessed 2026-07-12.

2. arXiv, "[2308.00352] MetaGPT version history", arXiv, 2024-11-01. https://arxiv.org/abs/2308.00352v7. Accessed 2026-05-20.

3. OpenReview, "MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (ICLR 2024 Oral)", OpenReview, 2024-01-16. https://openreview.net/forum?id=VtmBAGCN7o. Accessed 2026-05-20.

4. Sirui Hong et al., "MetaGPT (HTML version v7), Introduction and Motivation", arXiv, 2024-11-01. https://arxiv.org/html/2308.00352v7. Accessed 2026-07-12.

5. FoundationAgents, "MetaGPT repository README", GitHub, 2026-07-12. https://github.com/FoundationAgents/MetaGPT. Accessed 2026-07-12.

6. MetaGPT (@MetaGPT_), "Congratulations to MetaGPT for reaching 59.4k stars on GitHub", X (Twitter), 2025-11-13. https://x.com/MetaGPT_/status/1988942176559251879. Accessed 2026-05-20.

7. Press Advantage, "MetaGPT Team Launches MetaGPT X, World's First AI Multi-Agent Team Platform", Press Advantage, 2025-02-19. https://pressadvantage.com/story/77512-metagpt-team-launches-metagpt-x-world-s-first-ai-multi-agent-team-platform. Accessed 2026-05-20.

8. MetaGPT (@MetaGPT_), "MGX is now Atoms (rebranding announcement)", X (Twitter), 2026-01-13. https://x.com/MetaGPT_/status/2011063325413810415. Accessed 2026-05-20.

9. OpenReview, "Sirui Hong profile (Researcher, DeepWisdom)", OpenReview, 2025-01-01. https://openreview.net/profile?id=~Sirui_Hong1. Accessed 2026-05-20.

10. FoundationAgents, "MetaGPT README, Recognition section (ICLR orals, Product Hunt, Open100, GitHub Trending)", GitHub, 2026-07-12. https://github.com/FoundationAgents/MetaGPT#recognition. Accessed 2026-07-12.

11. Sirui Hong et al., "MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (PDF)", arXiv / ICLR 2024, 2024-11-01. https://arxiv.org/pdf/2308.00352. Accessed 2026-07-12.

12. Sirui Hong et al., "Data Interpreter: An LLM Agent For Data Science", arXiv:2402.18679, 2024-02-28. https://arxiv.org/abs/2402.18679. Accessed 2026-07-12.

13. Jinyu Xiang et al., "AFlow: Automating Agentic Workflow Generation", arXiv:2410.10762 (ICLR 2025 Oral), 2024-10-14. https://arxiv.org/abs/2410.10762. Accessed 2026-07-12.

14. KrAsia, "From MetaGPT to Atoms: DeepWisdom leads China's push into vibe coding", KrAsia, 2026-02-05. https://kr-asia.com/from-metagpt-to-atoms-deepwisdom-leads-chinas-push-into-vibe-coding. Accessed 2026-07-12.

15. IBM, "What is MetaGPT?", IBM Think Topics, 2024-12-01. https://www.ibm.com/think/topics/metagpt. Accessed 2026-05-20.

16. Sirui Hong et al., "MetaGPT (HTML v6), Section 3.2 Shared Message Pool", arXiv, 2024-08-01. https://arxiv.org/html/2308.00352v6. Accessed 2026-05-20.

17. Galileo, "Multi-Agent Coordination Gone Wrong? Fix With 10 Strategies", Galileo Blog, 2025-09-01. https://galileo.ai/blog/multi-agent-coordination-strategies. Accessed 2026-05-20.

18. Atoms (DeepWisdom), "Atoms product overview", atoms.dev, 2026-01-13. https://atoms.dev/. Accessed 2026-05-20.

19. geekan, "MetaGPT-docs documentation repository", GitHub, 2025-10-01. https://github.com/geekan/MetaGPT-docs. Accessed 2026-05-20.

20. Adam Fourney et al., "Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks", Microsoft Research, 2024-11-04. https://www.microsoft.com/en-us/research/publication/magentic-one-a-generalist-multi-agent-system-for-solving-complex-tasks/. Accessed 2026-05-20.

21. Carlos Jimenez et al., "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?", arXiv:2310.06770, 2023-10-10. https://arxiv.org/abs/2310.06770. Accessed 2026-05-20.

22. Sukh Sroay, "Multi-agent framework empirical analysis (25,000-task experiment)", X (Twitter), 2025-11-01. https://x.com/sukh_saroy/status/2039381283999293799. Accessed 2026-05-20.

23. Chen Qian et al., "ChatDev: Communicative Agents for Software Development", arXiv:2307.07924 (ACL 2024), 2023-07-16. https://arxiv.org/abs/2307.07924. Accessed 2026-05-20.

24. Qingyun Wu et al., "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation", arXiv:2308.08155, 2023-10-03. https://arxiv.org/abs/2308.08155. Accessed 2026-05-20.

25. crewAIInc, "CrewAI repository and documentation", GitHub, 2025-12-01. https://github.com/crewAIInc/crewAI. Accessed 2026-05-20.