Claude Code Review is a multi-agent code review system developed by Anthropic that automatically analyzes GitHub pull requests for bugs, security vulnerabilities, and logic errors. Launched on March 9, 2026, as a research preview for Team and Enterprise customers, it is built into Claude Code and represents Anthropic's response to the growing bottleneck of code review in an era where AI-assisted development has dramatically increased code output. A closely related capability, Claude Code Security, was introduced on February 20, 2026, providing deeper vulnerability scanning and patch generation for entire codebases.
The rise of AI coding assistants, including Claude Code, GitHub Copilot, and Cursor, has transformed software development by enabling engineers to produce code at unprecedented speed. Anthropic reported that code output per engineer at the company grew 200% in the year preceding the launch of Code Review. This surge in code production, sometimes referred to as vibe coding, created a new challenge: pull request reviews became a bottleneck that slowed the pace at which teams could ship features.
Cat Wu, Anthropic's head of product, explained the problem in an interview with TechCrunch: "We've seen a lot of growth in Claude Code, especially within the enterprise, and one of the questions that we keep getting from enterprise leaders is: Now that Claude Code is putting up a bunch of pull requests, how do I make sure that those get reviewed in an efficient manner?" Wu described Code Review as driven by "an insane amount of market pull," noting that as engineers develop with Claude Code, the friction to creating new features decreases while demand for thorough code review increases.
Before Code Review was deployed internally at Anthropic, only 16% of pull requests received substantive review comments. After deployment, that number jumped to 54%, closing a significant gap in code quality assurance.
Claude Code Review uses a multi-agent architecture to analyze pull requests. When a pull request is opened (or on each subsequent push, depending on configuration), the system dispatches multiple specialized AI agents that work in parallel. Each agent examines the code changes from a different angle, looking at different classes of potential issues within the context of the full codebase.
The review process follows several stages:
The average review takes approximately 20 minutes, scaling with the size and complexity of the pull request. Code Review does not approve or block pull requests; it surfaces findings and leaves the final merge decision to human developers.
Each finding is tagged with one of three severity levels:
| Marker | Severity | Meaning |
|---|---|---|
| Red | Normal | A bug that should be fixed before merging |
| Yellow | Nit | A minor issue worth fixing but not blocking |
| Purple | Pre-existing | A bug that exists in the codebase but was not introduced by the current pull request |
Findings include a collapsible extended reasoning section that developers can expand to understand why the system flagged the issue and how it verified the problem.
Anthropic published internal testing results that demonstrate the system's effectiveness:
| Pull Request Size | Percentage Receiving Findings | Average Issues per PR |
|---|---|---|
| Large PRs (1,000+ lines) | 84% | 7.5 |
| Small PRs (under 50 lines) | 31% | 0.5 |
Anthropic reports that less than 1% of findings are marked incorrect by engineers, indicating a very low false positive rate. In one notable internal case, Code Review flagged a production change that appeared routine but would have broken authentication, a bug that human reviewers missed during standard review.
Claude Code Security is a separate but complementary capability that was launched on February 20, 2026, approximately three weeks before Code Review. While Code Review focuses on pull request analysis, Claude Code Security scans entire codebases for security vulnerabilities and suggests targeted software patches for human review.
Unlike traditional static analysis tools that rely on pattern matching against known vulnerability signatures, Claude Code Security reasons about code the way a human security researcher would. The system understands how components interact, traces how data moves through an application, and identifies context-dependent weaknesses that rule-based scanners typically miss. It is powered by Claude Opus 4.6, Anthropic's most capable model at the time of launch.
Each identified vulnerability undergoes a multi-stage adversarial verification process. The system re-examines its own results, attempting to prove or disprove its findings and filter out false positives. Validated findings are assigned severity ratings and confidence scores, then presented in a dashboard for developer review. Nothing is applied without human approval.
Using Claude Opus 4.6, Anthropic's Frontier Red Team found over 500 previously unknown high-severity vulnerabilities in production open-source codebases. These bugs had gone undetected for years or even decades, despite extensive expert review and automated testing. Three notable examples illustrate the depth of the system's analysis:
GhostScript: Claude examined the Git commit history of this PostScript and PDF processing utility to identify security-relevant changes. It found that while one code path had bounds checking added for stack operations in Type 1 charstring font handling, another caller function lacked the same protection. Claude constructed a proof-of-concept crash demonstrating the vulnerability.
OpenSC: In this smart card data processing utility, Claude identified unsafe string concatenation operations involving multiple consecutive strcat calls writing into a 4,096-byte buffer without proper validation that the concatenated strings would fit. Traditional fuzzers rarely reached this code due to multiple preconditions, but Claude could reason about which code fragments were interesting and focus analysis accordingly.
CGIF: Claude discovered a heap buffer overflow in this GIF processing library by reasoning about the LZW compression algorithm. The library assumed compressed output would always be smaller than uncompressed input, which is almost always true. Claude recognized that if the LZW dictionary filled up and triggered resets, the compressed output could exceed the uncompressed size, overflowing the buffer. This required conceptual understanding of algorithm behavior that traditional coverage-guided fuzzing could not catch, even with 100% code coverage.
Claude Code Security detects a broad range of vulnerability categories:
| Category | Examples |
|---|---|
| Injection Attacks | SQL injection, command injection, LDAP injection, XPath injection, NoSQL injection, XML External Entity (XXE) |
| Authentication and Authorization | Broken authentication, privilege escalation, insecure direct object references (IDOR), session flaws |
| Data Exposure | Hardcoded secrets, sensitive data logging, information disclosure, PII handling violations |
| Cryptographic Issues | Weak algorithms, improper key management, insecure random number generation |
| Input Validation | Missing validation, improper sanitization, buffer overflows |
| Business Logic Flaws | Race conditions, time-of-check-time-of-use (TOCTOU) issues, access control vulnerabilities |
| Configuration Security | Insecure defaults, missing security headers, permissive CORS |
| Supply Chain | Vulnerable dependencies, typosquatting risks |
| Code Execution | Remote code execution via deserialization, eval injection, pickle injection |
| Cross-Site Scripting | Reflected XSS, stored XSS, DOM-based XSS |
Anthropic validated Claude's security capabilities through extensive testing over more than a year, including cybersecurity capture-the-flag (CTF) competitions and collaboration with Pacific Northwest National Laboratory (PNNL). In competitive CTF events, Claude ranked in the top 3% of PicoCTF participants globally, solved 19 of 20 challenges in the HackTheBox AI vs Human CTF, and placed 6th out of 9 teams defending live networks against human red team attacks at the Western Regional CCDC.
In collaboration with PNNL, Anthropic tested Claude against a simulated water treatment plant. PNNL researchers estimated that the model completed adversary emulation in three hours, while the traditional process takes multiple weeks.
Code Review integrates directly with GitHub through the Claude GitHub App. Administrators install the app through Claude Code settings and select which repositories to enable for review.
The setup process involves installing the Claude GitHub App on the organization's GitHub account, which requests read and write permissions for repository contents, issues, and pull requests. Administrators then select which repositories should receive automated reviews and configure the review trigger for each repository.
| Trigger Mode | Behavior | Cost Impact |
|---|---|---|
| Once after PR creation | Review runs once when a PR is opened or marked ready for review | Lowest cost |
| After every push | Review runs on every push to the PR branch; auto-resolves threads when flagged issues are fixed | Highest cost |
| Manual | Reviews start only when someone comments @claude review on a PR | Variable |
In any mode, commenting @claude review on a pull request starts a review and opts that PR into push-triggered reviews going forward.
Anthropic also provides an open-source GitHub Action (anthropics/claude-code-security-review) for teams that want to run security-focused reviews in their own CI infrastructure. This action can be added to any GitHub Actions workflow file and supports configuration options including directory exclusions, custom model selection, timeout settings, and custom security scan instructions. The action is language-agnostic and works with any programming language.
Beyond GitHub, Claude Code offers a GitLab CI/CD integration (currently in beta) built on top of the Claude Code CLI and Agent SDK. This enables teams using GitLab to run Claude-powered reviews in their own CI/CD pipelines. The integration supports instant merge request creation, automated implementation of issues into working code, and adherence to project-specific CLAUDE.md guidelines.
The GitLab integration runs on the organization's own GitLab runners with existing branch protection and approval workflows in place. Teams can choose between the Claude API, AWS Bedrock, or Google Vertex AI as the backend, allowing them to meet data residency and procurement requirements.
Engineering leads can customize what Code Review checks for by placing configuration files in their repositories. Two files control review behavior:
The CLAUDE.md file contains shared project instructions that Claude Code uses for all tasks, not just reviews. Code Review reads these files and treats newly introduced violations as nit-level findings. The system works bidirectionally: if a pull request changes code in a way that makes a CLAUDE.md statement outdated, Claude flags that the documentation needs updating too. Claude reads CLAUDE.md files at every level of the directory hierarchy, so rules in a subdirectory apply only to files under that path.
The REVIEW.md file is dedicated to review-only guidance and is read exclusively during code reviews. Engineering leads can use this file to encode:
For teams using the /code-review plugin locally, threshold settings range from 40 to 90 (with 80 as the default). Critical projects in fintech or healthcare may warrant a threshold of 60 to 70, while internal projects can remain at 80 to 90. The --focus flag allows narrowing the review to specific areas such as security, performance, or accessibility.
Code Review is billed based on token usage. Each review averages $15 to $25 in cost, scaling with pull request size, codebase complexity, and how many issues require verification. Code Review usage is billed separately through extra usage and does not count against the plan's included usage.
Organizations can set monthly spending limits through the Claude admin settings page. The analytics dashboard at claude.ai/analytics/code-review shows daily pull request review counts, weekly cost breakdowns, feedback counts (comments auto-resolved because a developer addressed the issue), and per-repository breakdowns.
The review trigger mode significantly affects total cost. The "after every push" mode multiplies cost by the number of pushes to a branch, while manual mode incurs no cost until someone explicitly requests a review.
Claude Code Review and Claude Code Security take a fundamentally different approach compared to traditional Static Application Security Testing (SAST) tools like Snyk, SonarQube, and Semgrep.
| Feature | Traditional SAST (Snyk, SonarQube, Semgrep) | Claude Code Review / Security |
|---|---|---|
| Detection Method | Pattern matching and rule-based scanning | Semantic reasoning about code behavior |
| Vulnerability Coverage | Known patterns and CVE databases | Known patterns plus novel, context-dependent flaws |
| Business Logic Flaws | Rarely detected without custom rules | Detected through reasoning about intended behavior |
| False Positive Rate | Varies; often high | Less than 1% marked incorrect (per Anthropic's internal testing) |
| Fix Suggestions | Template-based or limited | Context-aware patches tailored to the specific codebase |
| Setup Complexity | Rule configuration and tuning required | Minimal configuration; reads codebase context automatically |
| Language Support | Varies by tool; often language-specific rules | Language-agnostic |
| Cost Model | Per-seat or per-scan licensing | Per-review token-based billing ($15 to $25 average) |
Independent benchmarks from DryRun Security tested multiple SAST tools against a set of 26 known vulnerabilities. The results showed significant variation in detection rates across traditional tools. In those tests, Snyk detected 10 of 26 known vulnerabilities. Semgrep found less than half of the 26 vulnerabilities but led other legacy scanners. SonarQube was found to be the least accurate of the group. Claude Code Security's advantage is most pronounced for complex, context-dependent vulnerabilities such as IDOR conditions or business logic flaws that require understanding the application's intended security policy.
Security experts, including those at Snyk, have characterized Claude Code Security as a welcome addition rather than a direct replacement for existing tools. Snyk's analysis noted that the strongest security programs layer frontier AI reasoning with deterministic validation, automation, and governance. Traditional SAST tools remain valuable for their deterministic analysis of known patterns, software composition analysis (SCA), container scanning, and infrastructure-as-code (IaC) scanning. The recommended approach for 2026 is to combine AI-powered reasoning with traditional deterministic tools for comprehensive coverage.
A key caveat noted by industry analysts is that the same AI models that find vulnerabilities can also introduce them. Research from BaxBench showed that 62% of AI-generated solutions contain security issues, while CodeRabbit's analysis found that AI-generated code is 2.74 times more likely to introduce XSS vulnerabilities. This underscores the importance of using Claude Code Review and Security as part of a layered defense strategy rather than as a standalone solution.
Claude Code Review enters a market with several established AI code review alternatives, including CodeRabbit, GitHub Copilot's code review features, and Greptile. CodeRabbit supports GitHub, GitLab, Azure DevOps, and Bitbucket, while Claude Code Review is currently limited to GitHub with GitLab support in beta. GitHub Copilot's code review features were significantly improved with its March 2026 agentic architecture update.
In independent benchmarks, Greptile achieved an 82% bug catch rate, nearly double CodeRabbit's 44% and ahead of GitHub Copilot's 54%. Claude Code Review's multi-agent approach and its less than 1% incorrect finding rate represent a different optimization: prioritizing precision over recall to keep signal-to-noise ratio high for developers.
Claude Code Review launched as a research preview on March 9, 2026, available to Claude for Teams and Claude for Enterprise subscribers. It is not available for organizations with Zero Data Retention enabled. Claude Code Security launched as a limited research preview on February 20, 2026, also for Enterprise and Team customers, with expedited access for maintainers of open-source repositories.
Anthropic notes that the GitHub Action for security review is not hardened against prompt injection attacks and should only be used to review trusted pull requests. The company recommends configuring repositories to use the "Require approval for all external contributors" option to ensure workflows only run after maintainer review. Claude Code Security restricts scanning to code the organization owns and has rights to scan, excluding third-party, licensed, or open-source code without proper authorization.
The launch of Claude Code Security on February 20, 2026, had an immediate effect on cybersecurity markets. Bloomberg reported that shares of cybersecurity software companies fell following the announcement, as investors assessed the potential for AI-native security tools to displace traditional vendors. Industry analysts noted that Claude Code Security would be a strong fit for "born in AI" companies that standardize on Anthropic's ecosystem, as well as for small and mid-market organizations. For larger enterprises or companies in highly regulated industries such as financial services or healthcare, analysts suggested the offering would need to mature further before it could address the full breadth of compliance and governance requirements.
Anthropic stated its expectation that "a significant share of the world's code will be scanned by AI in the near future, given how effective models have become at finding long-hidden bugs and security issues."