Claude Code Review
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v3 ยท 6,672 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v3 ยท 6,672 words
Add missing citations, update stale details, or suggest a clearer explanation.
Claude Code Review is a multi-agent code review system developed by Anthropic that automatically analyzes GitHub pull requests for bugs, security vulnerabilities, and logic errors. Launched on March 9, 2026, as a research preview for Team and Enterprise customers, it is built into Claude Code and represents Anthropic's response to the growing bottleneck of code review in an era where AI-assisted development has dramatically increased code output.[1][2] A closely related capability, Claude Code Security, was introduced on February 20, 2026, providing deeper vulnerability scanning and patch generation for entire codebases.[3] In April 2026, Anthropic added /ultrareview, a cloud-hosted multi-agent reviewer accessible from the Claude Code CLI for Pro and Max subscribers, extending the same parallel-verification pattern outside the managed GitHub service.[4]
The rise of AI coding assistants, including Claude Code, GitHub Copilot, and Cursor, has transformed software development by enabling engineers to produce code at unprecedented speed. Anthropic reported that code output per engineer at the company grew 200% in the year preceding the launch of Code Review.[1] This surge in code production, sometimes referred to as vibe coding, created a new challenge: pull request reviews became a bottleneck that slowed the pace at which teams could ship features.[2]
Cat Wu, Anthropic's head of product for Claude Code, explained the problem in an interview with TechCrunch: "We've seen a lot of growth in Claude Code, especially within the enterprise, and one of the questions that we keep getting from enterprise leaders is: Now that Claude Code is putting up a bunch of pull requests, how do I make sure that those get reviewed in an efficient manner?"[2] Wu described Code Review as driven by "an insane amount of market pull," noting that as engineers develop with Claude Code, the friction to creating new features decreases while demand for thorough code review increases.[2] Wu added that Anthropic decided to "focus purely on logic errors. This way we're catching the highest priority things," explicitly excluding style and formatting feedback from the default review prompt.[2]
Before Code Review was deployed internally at Anthropic, only 16% of pull requests received substantive review comments. After deployment, that number jumped to 54%, closing a significant gap in code quality assurance.[1][2]
The market context is significant. Independent analyses estimate that 41% of new commits in 2026 originate from AI-assisted generation, while CodeRabbit's analysis of 470 pull requests found reviewers spend 91% more time on AI-generated code, with three times more readability problems and 75% more logic errors than human-written code.[5] CodeRabbit also reported that AI-generated code is 2.74 times more likely to introduce cross-site scripting (XSS) vulnerabilities and 1.91 times more likely to introduce insecure direct object references.[6]
Claude Code Review uses a multi-agent architecture to analyze pull requests. When a pull request is opened (or on each subsequent push, depending on configuration), the system dispatches multiple specialized AI agents that work in parallel. Each agent examines the code changes from a different angle, looking at different classes of potential issues within the context of the full codebase.[7]
The review process follows several stages:
The average review takes approximately 20 minutes, scaling with the size and complexity of the pull request.[1][7] Code Review does not approve or block pull requests; it surfaces findings and leaves the final merge decision to human developers.[7] The Claude Code Review check run always completes with a neutral conclusion so it never blocks merging through branch protection rules, though teams can read the severity breakdown via the GitHub API and gate merges in their own CI if desired.[7]
Each finding is tagged with one of three severity levels:[7]
| Marker | Severity | Meaning |
|---|---|---|
| Red | Important | A bug that should be fixed before merging |
| Yellow | Nit | A minor issue worth fixing but not blocking |
| Purple | Pre-existing | A bug that exists in the codebase but was not introduced by the current pull request |
Findings include a collapsible extended reasoning section that developers can expand to understand why the system flagged the issue and how it verified the problem.[7] Each review comment from Claude arrives with thumbs-up and thumbs-down reactions already attached so both buttons appear in the GitHub user interface for one-click rating. Anthropic collects reaction counts after the pull request merges and uses them to tune the reviewer.[7]
Anthropic published internal testing results that demonstrate the system's effectiveness:[1][2]
| Pull request size | Percentage receiving findings | Average issues per PR |
|---|---|---|
| Large PRs (1,000+ lines) | 84% | 7.5 |
| Small PRs (under 50 lines) | 31% | 0.5 |
Anthropic reports that less than 1% of findings are marked incorrect by engineers, indicating a low false-positive rate compared to traditional pattern-matching tools.[1][7] In one notable internal case, Code Review flagged a one-line production change that appeared routine but would have broken service authentication, a bug that human reviewers had missed during standard review.[1]
Beyond inline review comments, each review populates the Claude Code Review check run that appears alongside other CI checks. Expanding its Details link reveals a summary of every finding sorted by severity, with file path, line number, and a short issue description.[7] Each finding also appears as an annotation in the Files changed tab, marked directly on the relevant diff lines. Important findings render with a red marker, nits with a yellow warning, and pre-existing bugs with a gray notice.[7]
The last line of the check run details text is a machine-readable comment that workflows can parse with the gh CLI and jq to retrieve counts per severity in a JSON object such as {"normal": 2, "nit": 1, "pre_existing": 0}, where a non-zero normal value indicates at least one Important finding worth fixing before merge.[7]
Claude Code Security is a separate but complementary capability that was launched on February 20, 2026, approximately three weeks before Code Review. While Code Review focuses on pull request analysis, Claude Code Security scans entire codebases for security vulnerabilities and suggests targeted software patches for human review.[3][8]
Unlike traditional static analysis tools that rely on pattern matching against known vulnerability signatures, Claude Code Security reasons about code the way a human security researcher would. The system understands how components interact, traces how data moves through an application, and identifies context-dependent weaknesses that rule-based scanners typically miss.[3] It is powered by Claude Opus 4.6, Anthropic's most capable model at the time of launch.[3][9]
Each identified vulnerability undergoes a multi-stage adversarial verification process. The system re-examines its own results, attempting to prove or disprove its findings and filter out false positives. Validated findings are assigned severity ratings and confidence scores, then presented in a dashboard for developer review. Nothing is applied without human approval.[3][10]
Using Claude Opus 4.6, Anthropic's Frontier Red Team found over 500 previously unknown high-severity vulnerabilities in production open-source codebases.[3][9] These bugs had gone undetected for years or even decades, despite extensive expert review and automated testing. Three notable examples illustrate the depth of the system's analysis:
GhostScript: Claude examined the Git commit history of this PostScript and PDF processing utility to identify security-relevant changes. It found that while one code path had bounds checking added for stack operations in Type 1 charstring font handling, another caller function lacked the same protection. Claude constructed a proof-of-concept crash demonstrating the vulnerability.[9][11]
OpenSC: In this smart card data processing utility, Claude identified unsafe string concatenation operations involving multiple consecutive strcat calls writing into a 4,096-byte buffer without proper validation that the concatenated strings would fit. Traditional fuzzers rarely reached this code due to multiple preconditions, but Claude could reason about which code fragments were interesting and focus analysis accordingly.[9]
CGIF: Claude discovered a heap buffer overflow in this GIF processing library by reasoning about the LZW compression algorithm. The library assumed compressed output would always be smaller than uncompressed input, which is almost always true. Claude recognized that if the LZW dictionary filled up and triggered resets, the compressed output could exceed the uncompressed size, overflowing the buffer. This required conceptual understanding of algorithm behavior that traditional coverage-guided fuzzing could not catch, even with 100% code coverage.[9]
Claude Code Security detects a broad range of vulnerability categories:[12]
| Category | Examples |
|---|---|
| Injection attacks | SQL injection, command injection, LDAP injection, XPath injection, NoSQL injection, XML external entity (XXE) |
| Authentication and authorization | Broken authentication, privilege escalation, insecure direct object references (IDOR), session flaws |
| Data exposure | Hardcoded secrets, sensitive data logging, information disclosure, PII handling violations |
| Cryptographic issues | Weak algorithms, improper key management, insecure random number generation |
| Input validation | Missing validation, improper sanitization, buffer overflows |
| Business logic flaws | Race conditions, time-of-check-time-of-use (TOCTOU) issues, access control vulnerabilities |
| Configuration security | Insecure defaults, missing security headers, permissive CORS |
| Supply chain | Vulnerable dependencies, typosquatting risks |
| Code execution | Remote code execution via deserialization, eval injection, pickle injection |
| Cross-site scripting | Reflected XSS, stored XSS, DOM-based XSS |
Anthropic validated Claude's security capabilities through extensive testing over more than a year, including cybersecurity capture-the-flag (CTF) competitions and collaboration with Pacific Northwest National Laboratory (PNNL).[3] In competitive CTF events, Claude ranked in the top 3% of PicoCTF participants globally, solved 19 of 20 challenges in the HackTheBox AI vs Human CTF, and placed 6th out of 9 teams defending live networks against human red team attacks at the Western Regional CCDC.[3]
In collaboration with PNNL, Anthropic tested Claude against a simulated water treatment plant. PNNL researchers estimated that the model completed adversary emulation in three hours, while the traditional process takes multiple weeks.[3]
Code Review integrates directly with GitHub through the Claude GitHub App. Administrators install the app through Claude Code settings and select which repositories to enable for review.[7]
The setup process involves installing the Claude GitHub App on the organization's GitHub account, which requests read and write permissions for repository contents, issues, and pull requests. Administrators then select which repositories should receive automated reviews and configure the review trigger for each repository.[7] To verify setup, opening a test pull request causes a check run named Claude Code Review to appear within a few minutes for automatic-trigger repositories.[7]
| Trigger mode | Behavior | Cost impact |
|---|---|---|
| Once after PR creation | Review runs once when a PR is opened or marked ready for review | Lowest cost |
| After every push | Review runs on every push to the PR branch; auto-resolves threads when flagged issues are fixed | Highest cost |
| Manual | Reviews start only when someone comments @claude review on a PR | Variable |
In any mode, commenting @claude review on a pull request starts a review and opts that PR into push-triggered reviews going forward.[7] To run a single review without subscribing to future pushes, the alternative command @claude review once starts a one-shot review.[7] Manual triggers run on draft PRs, while automatic triggers do not.[7]
For self-hosted GitHub Enterprise Server instances, Code Review supports configuration through admin settings as long as the GHES instance is reachable from Anthropic infrastructure so Claude can clone repositories and post review comments.[13] If a GHES instance is behind a firewall, administrators must allowlist the Anthropic API IP addresses. Code Review setup involves creating a GitHub App on the GHES instance with the required permissions and events, then entering the app credentials (hostname, OAuth client ID and secret, GitHub App ID, client ID, client secret, webhook secret, and private key) in the Claude Code admin settings form.[13] Data Residency tenants (hostnames matching *.ghe.com) are not supported by Code Review.[13]
Anthropic also provides an open-source GitHub Action (anthropics/claude-code-security-review) for teams that want to run security-focused reviews in their own CI infrastructure.[14] The action can be added to any GitHub Actions workflow file and supports configuration options including directory exclusions, custom model selection, timeout settings, custom false-positive filtering, and custom security scan instructions.[14] The action is language-agnostic and works with any programming language because it relies on semantic analysis rather than pattern matching.[14] By default, the action analyzes only changed files in pull requests for efficient diff-aware scanning, with a 20-minute timeout on Claude analysis.[14]
A separate action, anthropics/claude-code-action, is the general-purpose Claude Code GitHub integration that supports interactive code assistance, code implementation, and PR review modes detected automatically from event type.[15] Released as version 1.0 on August 26, 2025, this action accumulated 7,700 GitHub stars by mid-2026 and supports authentication via the Anthropic Direct API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry.[15]
Beyond GitHub, Claude Code offers a GitLab CI/CD integration (currently in beta) built on top of the Claude Code CLI and Agent SDK. This enables teams using GitLab to run Claude-powered reviews in their own CI/CD pipelines.[16] The integration supports instant merge request creation, automated implementation of issues into working code, and adherence to project-specific CLAUDE.md guidelines.[16]
The GitLab integration runs on the organization's own GitLab runners with existing branch protection and approval workflows in place. Teams can choose between the Claude API, AWS Bedrock, or Google Vertex AI as the backend, allowing them to meet data residency and procurement requirements.[16] The integration is event-driven, triggered by web pipelines, merge request events, or @claude mentions, and runs in isolated containers with restricted network and filesystem access.[16]
For teams running custom continuous integration outside the managed GitHub or GitLab integrations, Claude Code supports a headless mode invoked with the -p (print) flag, which runs Claude Code non-interactively and outputs the result to stdout.[17] Claude Code offers three output formats in headless mode: text (default), json (wraps the response in a structured object with metadata), and stream-json (sends tokens one by one in JSON Lines format, suitable for real-time processing).[17] Headless mode requires Node.js 18 or higher, the @anthropic-ai/claude-code npm package, and a valid Anthropic API key.[17] Average execution time for a pull request of fewer than 500 lines is 30 to 60 seconds in this mode, considerably faster than the managed Code Review service because it uses a single pass rather than multi-agent verification.[17]
Engineering leads can customize what Code Review checks for by placing configuration files in their repositories. Two files control review behavior.[7]
The CLAUDE.md file contains shared project instructions that Claude Code uses for all tasks, not just reviews. Code Review reads these files and treats newly introduced violations as nit-level findings. The system works bidirectionally: if a pull request changes code in a way that makes a CLAUDE.md statement outdated, Claude flags that the documentation needs updating too.[7] Claude reads CLAUDE.md files at every level of the directory hierarchy, so rules in a subdirectory apply only to files under that path.[7]
The REVIEW.md file is dedicated to review-only guidance and is read exclusively during code reviews. Its contents are injected into the system prompt of every agent in the review pipeline as the highest-priority instruction block.[7] Because it is pasted verbatim, REVIEW.md is plain instructions: the @ import syntax used elsewhere in Claude Code memory files is not expanded, and referenced files are not read into the prompt.[7]
Engineering leads can use this file to encode:
For teams running reviews locally rather than through the managed GitHub service, the /code-review slash command is available in any Claude Code session, including the option to post findings as inline pull request comments with the --comment flag.[18] The command was named /simplify before Claude Code version 2.1.147.[7]
The official claude-code/plugins/code-review plugin launches four agents in parallel: two CLAUDE.md compliance auditors (for redundancy), one bug-detection agent focused on changes only, and one git-blame and history analyzer for context-based issues.[18] Each finding is scored on a 0-100 confidence scale, and only high-confidence issues (default threshold: 80) are posted as comments, dramatically reducing false positives and review noise.[18] The plugin automatically skips closed, draft, trivial, or already-reviewed PRs.[18] Findings are filtered to exclude pre-existing issues, code that looks like a bug but is not, pedantic nitpicks, issues that linters will catch, and issues that have lint-ignore comments.[18]
Code Review is billed based on token usage. Each review averages $15 to $25 in cost, scaling with pull request size, codebase complexity, and how many issues require verification.[1][7] Code Review usage is billed separately through extra usage credits and does not count against the plan's included usage.[7]
Organizations can set monthly spending limits through the Claude admin settings page at claude.ai/admin-settings/usage.[7] When the monthly spend cap is reached, Code Review posts a single comment on the pull request explaining that the review was skipped, with reviews resuming automatically at the start of the next billing period or immediately when an admin raises the cap.[7] The analytics dashboard at claude.ai/analytics/code-review shows daily pull request review counts, weekly cost breakdowns, feedback counts (comments auto-resolved because a developer addressed the issue), and per-repository breakdowns.[7]
The review trigger mode significantly affects total cost. The "after every push" mode multiplies cost by the number of pushes to a branch, while manual mode incurs no cost until someone explicitly requests a review.[7] Costs appear on the Anthropic bill regardless of whether the organization uses Amazon Bedrock or Google Vertex AI for other Claude Code features.[7]
Claude Code Review and Claude Code Security take a different approach compared to traditional Static Application Security Testing (SAST) tools like Snyk, SonarQube, and Semgrep.[19][20]
| Feature | Traditional SAST (Snyk, SonarQube, Semgrep) | Claude Code Review / Security |
|---|---|---|
| Detection method | Pattern matching and rule-based scanning | Semantic reasoning about code behavior |
| Vulnerability coverage | Known patterns and CVE databases | Known patterns plus novel, context-dependent flaws |
| Business logic flaws | Rarely detected without custom rules | Detected through reasoning about intended behavior |
| False positive rate | Varies; often high | Less than 1% marked incorrect (per Anthropic's internal testing) |
| Fix suggestions | Template-based or limited | Context-aware patches tailored to the specific codebase |
| Setup complexity | Rule configuration and tuning required | Minimal configuration; reads codebase context automatically |
| Language support | Varies by tool; often language-specific rules | Language-agnostic |
| Cost model | Per-seat or per-scan licensing | Per-review token-based billing ($15 to $25 average) |
Independent benchmarks from DryRun Security tested multiple SAST tools against a set of 26 known vulnerabilities seeded in publicly available applications in Ruby on Rails, Python/Django, C#/ASP.NET Core, and Java/Spring Boot.[19] The results showed significant variation in detection rates across traditional tools. In those tests, Snyk detected 10 of 26 known vulnerabilities, Semgrep found less than half (46%) but led other legacy scanners, and SonarQube was the least accurate of the group at 19% with default configurations.[19] Claude Code Security's advantage is most pronounced for complex, context-dependent vulnerabilities such as IDOR conditions or business logic flaws that require understanding the application's intended security policy.
Security experts, including those at Snyk, have characterized Claude Code Security as a welcome addition rather than a direct replacement for existing tools. Snyk's analysis noted that "finding vulnerabilities has never been the hard part," arguing that the genuine challenge is remediation at scale across hundreds of repositories without introducing new issues.[6] The strongest security programs layer frontier AI reasoning with deterministic validation, automation, and governance. Traditional SAST tools remain valuable for their deterministic analysis of known patterns, software composition analysis (SCA), container scanning, and infrastructure-as-code (IaC) scanning.[6]
A key caveat noted by industry analysts is that the same AI models that find vulnerabilities can also introduce them. Research from BaxBench showed that 62% of AI-generated solutions are either incorrect or contain security issues, while CodeRabbit's analysis found that AI-generated code is 2.74 times more likely to introduce XSS vulnerabilities and 1.91 times more likely to introduce insecure direct object references.[6][5] This underscores the importance of using Claude Code Review and Security as part of a layered defense strategy rather than as a standalone solution.
Claude Code Review enters a market with several established AI code review alternatives, including CodeRabbit, GitHub Copilot's code review features, Cursor's BugBot, and Greptile. CodeRabbit supports GitHub, GitLab, Azure DevOps, and Bitbucket, while Claude Code Review is currently limited to GitHub with GitLab support in beta.[20][21] GitHub Copilot's code review features were significantly improved with its March 2026 agentic architecture update, which gathers full project context before suggesting changes and can pass those suggestions directly to the coding agent to generate fix PRs automatically.[22] By March 2026, Copilot code review had handled 60 million reviews since its April 2025 launch, a tenfold increase.[22]
Cursor's BugBot processes over 2 million pull requests per month and has reviewed more than 1 million PRs since launch, flagging 1.5 million potential issues.[23] After more than 40 major experiments since launch, BugBot's resolution rate climbed from 52% to nearly 80%, with the number of resolved bugs per PR more than doubling from roughly 0.2 to about 0.5.[23] On the OpenSSF CVE Benchmark for security vulnerability detection, BugBot scored 80.45% F1 through higher recall (87.80%) at the cost of precision (74.23%).[23]
Macroscope's independent benchmarks ranked bug detection accuracy as Macroscope 48%, CodeRabbit 46%, Cursor BugBot 42%, and Greptile 24%; however, Greptile catches the most absolute bugs but generates significantly more noise at 11 false positives per run, while CodeRabbit catches fewer but rarely wastes time.[24]
CodeRabbit has processed more than 13 million pull requests across two million-plus repositories, serving over 8,000 paying customers including Chegg, Groupon, Life360, and Mercury.[25] Pricing starts at $12 per month per developer (Lite) and $24 per month per developer (Pro).[25] Codacy bundles SAST, software composition analysis, dynamic analysis, secrets detection, and AI code review at $15 per user per month, covering 49 languages.[21]
Claude Code Review's multi-agent approach and its less-than-1% incorrect finding rate represent a different optimization: prioritizing precision over recall to keep signal-to-noise ratio high for developers.[1][7] AI researcher Nir Zabari noted at launch that the release "sounds good on the surface, but it doesn't share any technical details" about what each parallel agent focuses on, a critique aimed at Anthropic's relatively sparse public documentation of the multi-agent architecture.[26]
Open-source alternatives include Aider's "architect" mode, which analyzes code structure and suggests architectural improvements across files, and Continue, a VS Code and JetBrains extension that connects to any large language model (local or cloud) and provides in-editor code review, explanation, and refactoring.[27] Sweep, which originally operated as an AI developer in GitHub and accumulated over 6,000 GitHub stars, pivoted in 2026 to build a JetBrains-focused coding assistant and autocomplete.[28]
Claude Code Review launched as a research preview on March 9, 2026, available to Claude for Teams and Claude for Enterprise subscribers.[1][2] It is not available for organizations with Zero Data Retention enabled.[7] Claude Code Security launched as a limited research preview on February 20, 2026, also for Enterprise and Team customers, with expedited access for maintainers of open-source repositories.[3]
By March 2026, Claude Code subscriptions had quadrupled since the start of 2026, and Claude Code's run-rate revenue had climbed past $2.5 billion.[2] Anthropic crossed a $30 billion annualized revenue run rate by April 2026, up from roughly $9 billion at the end of 2025.[29] Enterprise customers named as initial users of Code Review include Uber, Salesforce, and Accenture.[2]
Anthropic notes that the GitHub Action for security review is not hardened against prompt injection attacks and should only be used to review trusted pull requests.[14] The company recommends configuring repositories to use the "Require approval for all external contributors" option to ensure workflows only run after maintainer review.[14] Claude Code Security restricts scanning to code the organization owns and has rights to scan, excluding third-party, licensed, or open-source code without proper authorization.[10]
In April 2026, security researchers published a "Comment and Control" attack disclosing that Claude Code Security Review, Google Gemini CLI Action, and GitHub Copilot Agent were vulnerable to prompt injection via GitHub comments, where PR titles, issue bodies, and issue comments could be turned into attack vectors for API key and token theft.[30] The vulnerability was first reported on October 17, 2025; Anthropic accepted it as Critical (CVSS 9.3), upgraded it to Critical (CVSS 9.4) on November 25, 2025, and downgraded its severity classification on April 20, 2026 after deploying mitigations including blocking the ps command via --disallowed-tools 'Bash(ps:*)' and updating the documentation with security warnings.[30] Anthropic awarded the researchers a $100 bug bounty.[30] Anthropic's documentation continues to state that "this action is not hardened against prompt injection attacks and should only be used to review trusted pull requests."[14]
The launch of Claude Code Security on February 20, 2026, had an immediate effect on cybersecurity markets. Bloomberg reported that shares of cybersecurity software companies fell following the announcement, as investors assessed the potential for AI-native security tools to displace traditional vendors.[31] Industry analysts noted that Claude Code Security would be a strong fit for "born in AI" companies that standardize on Anthropic's ecosystem, as well as for small and mid-market organizations. For larger enterprises or companies in highly regulated industries such as financial services or healthcare, analysts suggested the offering would need to mature further before it could address the full breadth of compliance and governance requirements.[8]
Anthropic stated its expectation that "a significant share of the world's code will be scanned by AI in the near future, given how effective models have become at finding long-hidden bugs and security issues."[3]
The progression of Anthropic's Claude model line directly affected Code Review's capabilities. Claude Opus 4.6, the model powering Claude Code Security at launch in February 2026, scored 80.8% on SWE-bench Verified.[32] Claude Sonnet 4.6 scored 79.6%, making it a cost-effective option for general code review at $3 per million input tokens and $15 per million output tokens.[33] Claude Opus 4.7, which went generally available on April 16, 2026, raised SWE-bench Verified from 80.8% to 87.6% and improved CursorBench from 58% to 70%. Opus 4.7 also "devises ways to verify its own outputs before reporting back," writing tests, running sanity checks, and inspecting its own output rather than declaring tasks complete, behavior particularly useful for the verification stage of multi-agent code review.[34]
Claude Opus 4.5, released November 24, 2025, was the first AI model to break 80% on SWE-bench Verified (scoring 80.9%) and dropped Opus pricing by approximately 67% compared to the previous Opus model, bringing it to $5 per million input tokens and $25 per million output tokens.[35] This pricing reduction directly affected the cost feasibility of running multi-agent reviews at scale.
On April 22, 2026, Anthropic announced /ultrareview, a slash command that launches a deep multi-agent code review in a remote sandbox on Claude Code on the web infrastructure.[4] Ultrareview requires CLI version 2.1.86 or later and is positioned as the cloud-hosted counterpart to the local /review command. Compared to local /review, ultrareview offers higher signal (every reported finding is independently reproduced and verified by another agent before being counted), broader coverage (many reviewer agents explore the change in parallel), and no local resource use (the review runs entirely in a remote sandbox).[4]
Without arguments, ultrareview reviews the diff between the current branch and the default branch, including uncommitted and staged changes. Passing a pull request number (/ultrareview 1234) causes the remote sandbox to clone the pull request directly from GitHub rather than bundling the local working tree.[4] A typical review takes 5 to 10 minutes and runs as a background task, allowing developers to continue working or close the terminal entirely.[4]
| Plan | Included free runs | After free runs |
|---|---|---|
| Pro | 3 free runs | Billed as usage credits |
| Max | 3 free runs | Billed as usage credits |
| Team and Enterprise | None | Billed as usage credits |
Pro and Max subscribers received three free runs to try the feature; after exhausting them, each review costs $5 to $20 depending on change size.[4] A non-interactive claude ultrareview subcommand supports CI invocation, blocking until the remote review finishes, printing findings to stdout, and exiting with code 0 on success or 1 on failure, with options for raw JSON output (--json) and timeout control (--timeout <minutes>, default 30).[4]
Ultrareview is not available when using Claude Code with Amazon Bedrock, Google Cloud Vertex AI, or Microsoft Foundry, and it is not available to organizations that have enabled Zero Data Retention.[4]
Beyond the managed Code Review and /ultrareview services, Claude Code's subagent system enables developers to construct their own multi-agent review workflows. Subagents are reusable configurations defined in YAML files in .claude/agents/ with a custom system prompt, model selection, and tool permissions.[36] A typical multi-agent review pattern runs three parallel subagent reviews on staged changes: a security review checking for vulnerabilities and authentication issues, a performance review checking for N+1 queries and memory leaks, and a style review checking for consistency with project patterns, with findings synthesized into a single priority-ranked summary.[36]
Community implementations have extended this pattern to nine or more parallel subagents covering testing, linting, security, performance, and maintainability simultaneously.[37] These self-built approaches typically cost less than the managed Code Review service but require setup effort and produce more variable result quality.[37] Anthropic recommends starting with CLAUDE.md for rules that apply every session, moving to Skills for specific workflows needed only sometimes, and reaching for subagents when a task would bloat the conversation context.[36]
At the Code with Claude conference on May 6, 2026, in San Francisco, Anthropic announced several features that affect code review workflows.[38] Three new managed agent features were introduced: Multiagent Orchestration (scaling a fleet of agents to break down complex tasks), Outcomes (defining success criteria so agents can iterate and improve over time), and Dreaming (allowing Claude to recall previous sessions and build on past work).[38] Claude Code's five-hour rate limits were doubled across the Pro, Max, Team, and Enterprise consumer plans, with the peak-hour throttle lifted entirely.[38] A SpaceX partnership announced at the conference gave Anthropic access to over 220,000 Nvidia GPUs at the Colossus 1 data center in Memphis, Tennessee, expanding the compute capacity available to back parallel-agent workflows like Code Review.[38]
GitHub Copilot shipped its agentic code review architecture in March 2026.[22] The updated review now gathers full project context before suggesting changes and can pass those suggestions directly to a coding agent to generate fix pull requests automatically. In May 2026, GitHub renamed the "Implement suggestion" button to "Fix with Copilot" and replaced the "Implement all suggestions" button with "Fix batch with Copilot," allowing developers to batch together specific feedback before handing it off to the Copilot cloud agent.[39] Starting June 1, 2026, GitHub Copilot code review runs began consuming GitHub Actions minutes, changing the cost model from a flat per-seat charge to a partly metered model.[40]
Internal adoption at Anthropic preceded the public launch by months. Code Review ran on nearly every internal pull request for months before the March 2026 announcement, providing the data behind the published metrics of 84% finding rates on large pull requests and the increase from 16% to 54% in substantive review coverage.[1] Beta customers identified by Anthropic include Uber, Salesforce, and Accenture.[2] Wu characterized Code Review as primarily targeting "larger scale enterprise users."[2]
Developer reception has been mixed. Positive feedback has focused on the multi-agent approach producing fewer false positives and more substantive findings than single-pass alternatives, the precision-focused calibration (less than 1% incorrect findings), and the deep codebase context that allows the system to catch business logic errors traditional SAST tools miss.[26] Critiques have centered on three areas: pricing (commenters questioned whether $15 to $25 per review and a 20-minute average duration would be practical for high-volume engineering workflows, with some suggesting it may limit adoption for smaller teams), lack of technical transparency (the public documentation does not detail what each parallel agent focuses on or how verification proceeds), and a meta-safety concern that Claude both writing the code and reviewing it falls short of acceptable validation independence.[26]
Independent testing on 20-plus pull requests confirmed that the multi-agent approach produces fewer false positives than single-pass alternatives but found that Claude can miss business-logic errors that require understanding the product feature rather than only the code change.[26]
Code Review has documented limitations beyond the prompt injection concerns in the security review action. The managed service is restricted to GitHub repositories on github.com or self-hosted GitHub Enterprise Server; GitLab support is in beta and Bitbucket and Azure DevOps are not supported.[7][16] Reviews are best-effort: a failed run never blocks a pull request but also does not retry on its own; the GitHub "Re-run" button does not retrigger Code Review, requiring instead a new push or a @claude review once comment.[7] When the monthly spend cap is reached, reviews are skipped until either the next billing period or an admin raises the cap.[7]
The 20-minute average review time is slower than competing tools optimized for fast feedback. Cursor BugBot, Greptile, and CodeRabbit typically return reviews in under five minutes, though they use single-pass rather than multi-agent verification.[21][23] Some findings may reference lines that no longer exist in the current diff if a push happens during a review; those appear under an "Additional findings" heading in the review body rather than as inline comments.[7]
The service does not support organizations with Zero Data Retention enabled.[7] Ultrareview is additionally not available when using Claude Code with Amazon Bedrock, Google Cloud Vertex AI, or Microsoft Foundry as the backend.[4]
The accuracy advantage relative to traditional SAST tools is not universal. Independent benchmarks from sources like Macroscope rank CodeRabbit, Cursor BugBot, and other tools competitively on certain bug detection metrics, and Anthropic has not published a head-to-head benchmark of Code Review against these competitors using a public dataset.[24]