Claude Haiku 4.5

Claude Haiku 4.5 is a large language model developed by Anthropic and released on October 15, 2025. It is the third generation of the Claude Haiku product line and the lightweight, high-speed member of the Claude 4.5 model family.^[1] Anthropic describes it as offering "near-frontier intelligence" at the lowest price point in the Claude lineup, targeting developers and enterprises that need fast, cost-efficient inference at scale.^[1]^[2]

With a price of $1.00 per million input tokens and $5.00 per million output tokens, Claude Haiku 4.5 is three times cheaper than Claude Sonnet 4.5 and five times cheaper than Claude Opus 4.5. At the same time, it scores 73.3% on SWE-bench Verified, matching Claude Sonnet 4 and coming within five percentage points of Claude Sonnet 4.5.^[1] The model is notable for being the first in the Haiku line to support extended thinking and computer use, capabilities previously reserved for larger and more expensive models.^[1]^[9]

Claude Haiku 4.5 is available through the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI, with day-one integrations from GitHub Copilot, Warp, Augment, Zencoder, and Anthropic's own Claude Code terminal agent.^[1]^[12]^[16] On the same day as the launch, Anthropic also made Haiku 4.5 the default model for free-tier users on claude.ai.^[20]

The model fits inside the broader Claude 4 family. There is no Claude Haiku 4 without the .5 suffix; Anthropic skipped the Haiku tier at the May 2025 family launch and produced the first Claude 4-generation Haiku as Haiku 4.5 in October 2025.^[10] Subsequent Anthropic releases have continued to position Haiku 4.5 as the current Haiku-tier offering through May 2026, even after the Sonnet and Opus tiers moved to one-million-token context windows in February 2026. There is no Claude Haiku 4.6: Anthropic has not announced a refresh of the Haiku tier, and the October 2025 snapshot remains the only Haiku model in the Claude 4 generation as of May 2026.^[2]^[21] Anthropic's published deprecation notices named claude-haiku-4-5-20251001 as the recommended replacement for both Claude 3.5 Haiku (retired February 19, 2026) and Claude 3 Haiku (retired April 20, 2026), consolidating all production Haiku traffic on the 4.5 snapshot.^[35]

Background

The Claude Haiku product line

The Haiku name within Anthropic's model lineup designates the fastest and most economical tier of each Claude generation. While Claude Opus 4.5 and Claude Sonnet 4.5 are positioned for complex multi-step reasoning and balanced performance, the Haiku models are optimized for low latency, high throughput, and cost efficiency at the expense of maximum reasoning depth.

The Haiku lineage began with Claude 3 Haiku, which Anthropic released on March 4, 2024, as part of the Claude 3 family. Claude 3 Haiku offered a 200,000-token context window and priced at just $0.25 per million input tokens and $1.25 per million output tokens, establishing Haiku as Anthropic's most accessible model tier.^[10] It supported text and image inputs and was positioned for real-time tasks requiring fast responses.

Claude 3.5 Haiku followed on October 22, 2024, raising the performance bar substantially. At launch, it priced at $0.80 per million input tokens and $4.00 per million output tokens, reflecting a significant capability upgrade over the Claude 3 version. Claude 3.5 Haiku maintained the 200,000-token context window and added stronger coding ability, with an 88.1% score on the HumanEval benchmark.^[7] However, it did not yet include extended thinking or computer use, which remained features of the larger Sonnet and Opus models.

Claude Haiku 4.5, released in October 2025, represents the first Haiku generation to close the capability gap with the upper tiers in a meaningful way. It inherits the 200,000-token context window but introduces extended thinking, computer use, and context awareness for the first time in the Haiku line. Its SWE-bench Verified score of 73.3% surpasses what Claude Sonnet 4 achieved at launch (72.7%), demonstrating how Anthropic progressively filters frontier capabilities into smaller, cheaper models with each new generation.^[1]^[7]

The progression of Haiku-tier models is summarized in the table below.

Haiku model	Release date	Context window	Max output	Input ($/MTok)	Output ($/MTok)	Extended thinking	Computer use	Vision	SWE-bench Verified
Claude 3 Haiku	March 4, 2024	200K	4,096	$0.25	$1.25	No	No	Yes	n/a
Claude 3.5 Haiku	October 22, 2024	200K	8,192	$0.80	$4.00	No	No	No (text only)	40.6%
Claude Haiku 4.5	October 15, 2025	200K	64,000	$1.00	$5.00	Yes	Yes	Yes	73.3%

Position within the Claude 4.5 family

The Claude 4.5 family at launch comprised three tiers: Opus, Sonnet, and Haiku, following Anthropic's standard naming convention. Claude Sonnet 4.5 serves as the mid-tier model, offering a balance of performance and cost with a 200,000-token context window and pricing of $3.00 input / $15.00 output per million tokens. Claude Opus 4.5 is the flagship, commanding $5.00 input / $25.00 output per million tokens with the deepest reasoning capabilities.

Claude Haiku 4.5 fills the role of the high-volume, latency-sensitive workhorse. Anthropic explicitly frames it as running more than twice as fast as Sonnet 4 and four to five times faster than Sonnet 4.5 in end-to-end application latency, while delivering approximately 90% of Sonnet 4.5's coding performance at one-third the cost.^[1] This positioning makes it well suited for multi-agent architectures, where Sonnet 4.5 or another orchestrator model plans and delegates tasks, while multiple Haiku 4.5 instances execute subtasks in parallel.

In its launch post, Anthropic framed the release with the line, "What was recently at the frontier is now cheaper and faster," pointing to the fact that Sonnet 4 was the company's flagship coding model only five months earlier and that Haiku 4.5 now matches or exceeds it on most published benchmarks.^[1] Anthropic Chief Product Officer Mike Krieger described Haiku as enabling "entirely new categories of what's possible with AI in production environments," with Zencoder Chief Executive Officer Andrew Filev calling the model "unlocking an entirely new set of use cases."^[12] Anthropic also disclosed an internal evaluation in which Haiku 4.5 outperformed its previous models at slide-text generation, reaching 65% instruction-following accuracy versus 44% for the team's prior premium model, an improvement the company described as material for its internal unit economics.^[1]^[22]

Technical specifications

The table below summarizes the core technical parameters of Claude Haiku 4.5 as documented in Anthropic's official API documentation.

Parameter	Value
Model ID (Anthropic API)	`claude-haiku-4-5-20251001`
Model alias	`claude-haiku-4-5`
AWS Bedrock ID	`anthropic.claude-haiku-4-5-20251001-v1:0`
GCP Vertex AI ID	`claude-haiku-4-5@20251001`
Release date	October 15, 2025
Context window	200,000 tokens
Max output tokens	64,000 tokens
Input modalities	Text, images
Output modalities	Text
Extended thinking	Supported (up to 128K thinking budget)
Adaptive thinking	Not supported
Computer use	Supported
Tool use / function calling	Supported
MCP support	Supported
Structured outputs	Generally available (`output_config.format`)
Priority Tier	Supported
Reliable knowledge cutoff	February 2025
Training data cutoff	July 2025
AI Safety Level	ASL-2
Prompt caching (min tokens)	4,096 tokens
Prompt caching (max checkpoints)	4 per request

The model ID snapshot date 20251001 reflects the specific training snapshot and guarantees consistent behavior across all deployment platforms. Models with the same snapshot date are identical whether accessed through the Anthropic API, Amazon Bedrock, or Google Cloud Vertex AI.^[2]

The 200,000-token context window translates to approximately 150,000 words or 680,000 Unicode characters. The maximum output of 64,000 tokens is the same as Claude Sonnet 4.5, a substantial increase over the earlier Claude 3.5 Haiku, which was capped at 8,192 output tokens in standard configuration.^[7]

Extended thinking, when enabled, allows the model to reason through complex problems before returning a final answer. Anthropic ran several launch-day evaluations with thinking budgets of up to 128,000 tokens, illustrating that Haiku 4.5 can in principle dedicate significantly more compute to reasoning than its 64,000-token output cap suggests.^[1] In this mode, thinking tokens are billed as output tokens at the standard output rate of $5.00 per million. Adaptive thinking, which automatically decides whether to invoke extended reasoning, is available in Sonnet and Opus models in the Claude 4.5 and Claude 4.6 generations but not in Haiku 4.5.^[2]^[9]

Tokenizer and capacity differences

Unlike Claude Opus 4.7, which shipped with a new tokenizer in April 2026, Haiku 4.5 uses the same tokenizer as the rest of the Claude 4 and Claude 4.5 family. Per-token cost comparisons against Sonnet 4 or Opus 4.5 therefore translate cleanly into per-task cost comparisons without the up to 35% token-count inflation that affects Opus 4.7 migrations.

Pricing

Claude Haiku 4.5 uses Anthropic's standard per-token billing model with no subscription requirement for API access. The headline prices have remained stable from launch through May 2026.^[23]

Tier	Input (per 1M tokens)	Output (per 1M tokens)
Standard	$1.00	$5.00
Batch API (50% discount)	$0.50	$2.50
Prompt cache writes (5-minute TTL)	$1.25	n/a
Prompt cache writes (1-hour TTL)	$2.00	n/a
Prompt cache reads	$0.10	n/a

Prompt caching allows developers to store frequently reused portions of their context, such as long system prompts or reference documents, and pay only the cache read rate ($0.10 per million tokens) on subsequent requests.^[2] Cache writes cost slightly more than standard input at $1.25 per million tokens for the five-minute tier, but any workload that reuses the same context more than once typically saves money.

Batch processing through the Message Batches API provides a 50% discount on both input and output tokens in exchange for non-real-time processing with up to 24-hour turnaround. This is well suited for offline data extraction, classification, and summarization pipelines. Combining prompt caching with batch processing can compound to about a 95% reduction on the cached portion of input tokens for asynchronous workloads.^[23]

Third-party platform pricing through Amazon Bedrock and Google Cloud Vertex AI may differ from direct Anthropic API pricing. For the most current platform-specific rates, consult the respective cloud providers' pricing pages.

Cost comparison with sibling models

The table below compares Haiku 4.5 against the rest of the Haiku line and the other 4.5-generation tiers on a per-token basis.

Model	Input ($/MTok)	Output ($/MTok)	Ratio vs Haiku 4.5 input	Ratio vs Haiku 4.5 output
Claude Haiku 4.5	$1.00	$5.00	1.0x	1.0x
Claude 3.5 Haiku	$0.80	$4.00	0.80x	0.80x
Claude 3 Haiku	$0.25	$1.25	0.25x	0.25x
Claude Sonnet 4.5	$3.00	$15.00	3.0x	3.0x
Claude Opus 4.5	$5.00	$25.00	5.0x	5.0x

Haiku 4.5 is priced 25% above Haiku 3.5 on a per-token basis but offers a substantially expanded output token limit (eight times more than Haiku 3.5), extended thinking, computer use, and significantly higher performance across all benchmarks.^[1]^[7]

At 100,000 monthly customer-service sessions of roughly equal token mix, Haiku 4.5 costs roughly $2,250 versus approximately $6,750 for the same workload run on Sonnet 4.5, a difference frequently cited in launch coverage as the practical lever that makes large-scale agentic deployments commercially viable.^[8]

Benchmark performance

The table below presents benchmark scores for Claude Haiku 4.5 alongside Claude Sonnet 4 (the previous generation's flagship-tier Sonnet) and Claude 3.5 Haiku (the immediate predecessor in the Haiku line). All Haiku 4.5 numbers come from Anthropic's launch announcement and from third-party comparisons that reproduce the launch-day figures.

Benchmark	Claude Haiku 4.5	Claude Sonnet 4	Claude 3.5 Haiku	Source
SWE-bench Verified	73.3%	72.7%	40.6%	Anthropic launch post^[1]^[7]
OSWorld (computer use)	50.7%	42.2%	n/a	Anthropic launch post^[1]
GPQA Diamond	73.0%	75.4%	41.6%	Anthropic launch post^[1]^[7]
MMMLU (multilingual)	83.0%	86.5%	n/a	Anthropic launch post^[1]^[7]
AIME 2025	80.7%	70.5%	n/a	Anthropic launch post^[1]^[7]
MMMU	73.2%	74.4%	n/a	Anthropic launch post^[1]^[8]
Tau2-bench Retail	83.2%	80.5%	n/a	Anthropic launch post^[1]^[7]
Tau2-bench Telecom	83.0%	n/a	n/a	Anthropic launch post^[1]^[7]
Terminal-Bench	~41%	35.5%	n/a	Anthropic launch post^[1]^[8]
HumanEval	n/a	n/a	88.1%	LLM-Stats^[7]
MATH	n/a	n/a	69.4%	LLM-Stats^[7]

Notes: SWE-bench Verified scores for Haiku 4.5 are averaged over 50 trials. Terminal-Bench scores were reported across 11 runs at roughly 40% to 42%, with and without extended thinking enabled. AIME 2025 scores are averaged over ten runs with a 128K thinking budget. MMMLU is averaged across ten runs in 14 non-English languages. n/a indicates the benchmark was not reported by Anthropic for that model at launch or was not included in the comparison set.^[1]^[7]

On Anthropic's own published comparisons, Haiku 4.5 ties or exceeds Sonnet 4 on five of the eight benchmarks where both were reported (SWE-bench Verified, OSWorld, AIME 2025, Tau2 Retail, and Terminal-Bench), and trails Sonnet 4 by between 1 and 4 percentage points on the remaining three (GPQA Diamond, MMMLU, MMMU).^[1] Against Haiku 3.5, the gains are uniformly large: SWE-bench Verified rises from 40.6% to 73.3% (a 32.7-point gain), and GPQA Diamond rises from 41.6% to 73.0% (a 31.4-point gain).^[7]

SWE-bench Verified

SWE-bench Verified is a benchmark of real-world GitHub issue resolution, requiring a model to identify the root cause of a software bug or missing feature and write a patch that passes the repository's test suite. Claude Haiku 4.5's 73.3% score, measured over 50 trials to reduce variance, is a substantial jump from Claude 3.5 Haiku's 40.6% and places it within five percentage points of Claude Sonnet 4.5 at 77.2%. It also exceeds the score that Claude Sonnet 4 achieved at launch (72.7%), illustrating how agentic coding capability flows down to smaller models over successive generations.^[1]

OSWorld computer use

OSWorld is a benchmark for evaluating a model's ability to navigate graphical user interfaces, control a computer cursor, fill forms, and complete multi-step computer tasks autonomously. Claude Haiku 4.5 achieves a 50.7% success rate on OSWorld, the highest any Haiku model has achieved on that benchmark and a meaningful improvement over Claude Sonnet 4's 42.2% score.^[1] Anthropic notes that computer use at this accuracy level requires human oversight for production deployments and is not yet reliable enough for fully autonomous operation.

GPQA Diamond

GPQA Diamond is a benchmark of difficult graduate-level questions in biology, chemistry, and physics designed to challenge expert human knowledge. Claude Haiku 4.5's 73.0% score represents a substantial improvement over Claude 3.5 Haiku's 41.6%, a 31.4-percentage-point gain that reflects the broader reasoning improvements in the Claude 4.5 generation. Haiku 4.5 trails Sonnet 4 (75.4%) by 2.4 points on this benchmark, which is one of three areas where the Haiku model does not match the previous Sonnet flagship.^[1]^[7]

MMMLU and academic benchmarks

On MMMLU, a multilingual extension of the MMLU benchmark, Haiku 4.5 scores 83.0%, indicating strong performance across academic domains in multiple languages. The score is averaged over ten runs across 14 non-English languages, a methodology consistent with Anthropic's reporting on the rest of the 4.5 family. Sonnet 4 scored 86.5% on the same benchmark.^[1]^[7]

Haiku 4.5's AIME 2025 score of 80.7% on mathematics competition problems, enabled in part by extended thinking with a 128,000-token thinking budget, demonstrates meaningful quantitative reasoning capability when the model is given time to deliberate. The 80.7% score represents a 10.2-percentage-point improvement over Sonnet 4's 70.5% on the same benchmark, marking one of the cleaner cases where a smaller model with newer training surpasses a larger predecessor through better reasoning rather than greater capacity.^[1]^[7]

Tau2-bench retail and telecom

Tau2-bench is a benchmark for tool-using agents in customer-service settings. Haiku 4.5 reaches 83.2% on the Tau2 Retail track and 83.0% on Tau2 Telecom, placing it ahead of Sonnet 4's reported 80.5% on Tau2 Retail. The benchmark explicitly tests the kinds of multi-step, tool-calling workflows that Anthropic targets with the Haiku tier, and the strong scores were prominent in launch-day marketing.^[1]

Vision and MMMU

MMMU is a multimodal benchmark that tests image understanding across scientific, engineering, and humanistic domains. Haiku 4.5's 73.2% score is within 1.2 percentage points of Sonnet 4 (74.4%), confirming that the model carries the family's vision capability into a smaller package without significant degradation.^[1]^[8]

Speed and throughput

In independent evaluations by Artificial Analysis updated through May 2026, Claude Haiku 4.5 generates output at approximately 91.2 tokens per second on Anthropic's own API, with a time-to-first-token of about 0.76 seconds. The figure sits well above the roughly 57 tokens-per-second median for non-reasoning models in a similar price tier and the median 1.63-second time-to-first-token in that band.^[6] Anthropic describes the model as running more than twice as fast as Sonnet 4 and four to five times faster than Sonnet 4.5 in end-to-end application latency.^[1]

Provider choice has a measurable effect on throughput. Artificial Analysis reports Amazon Bedrock at roughly 108 tokens per second, Microsoft Azure AI Foundry at roughly 103, and Google Cloud Vertex AI at roughly 101, all higher than the Anthropic-hosted endpoint. Teams building voice agents and other latency-critical applications often pick the fastest available regional provider rather than defaulting to the Anthropic API.^[6]^[24]

Artificial Analysis assigns Haiku 4.5 an Intelligence Index score of 31, ranking it 25th overall while noting that it sits above the median for non-reasoning models on a price-adjusted basis.^[6]

Comparison with other Claude models

The following table compares Claude Haiku 4.5 to current and legacy Claude models across the key dimensions relevant to deployment decisions.

Model	Input ($/MTok)	Output ($/MTok)	Context window	Max output	Extended thinking	Adaptive thinking	Latency tier
Claude Haiku 4.5	$1.00	$5.00	200K	64K	Yes	No	Fastest
Claude Sonnet 4.5	$3.00	$15.00	200K	64K	Yes	No	Fast
Claude Opus 4.5	$5.00	$25.00	200K	64K	Yes	Yes	Moderate
Claude Sonnet 4.6	$3.00	$15.00	1M	64K	Yes	Yes	Fast
Claude Opus 4.6	$5.00	$25.00	1M	128K	Yes	Yes	Moderate
Claude Opus 4.7	$5.00	$25.00	1M	128K	No (adaptive only)	Yes	Moderate
Claude 3.5 Haiku	$0.80	$4.00	200K	8,192	No	No	Fast
Claude 3 Haiku	$0.25	$1.25	200K	4,096	No	No	Fast

Claude Haiku 4.5 is priced 25% above Claude 3.5 Haiku on a per-token basis but offers a substantially expanded output token limit, extended thinking, computer use, and significantly higher performance across all benchmarks.

Compared to Sonnet 4.5, Haiku 4.5 costs three times less on input and output but offers the same 200,000-token context window and the same 64,000-token output limit. The practical performance gap between the two models is most visible in tasks requiring complex multi-step reasoning, where Sonnet 4.5 has the larger advantage, while for coding and agentic tasks the gap narrows considerably.

Compared to the newer Sonnet 4.6 and Opus 4.7 models, Haiku 4.5 has a smaller context window (200K versus 1M tokens) and lower maximum output (64K versus 128K for Opus 4.7). Sonnet 4.6 and Opus 4.7 represent later generation releases with updated knowledge cutoffs and adaptive thinking, whereas Haiku 4.5 remains the current Haiku-tier offering as of May 2026.

Comparison with non-Anthropic small models

Third-party evaluators routinely group Claude Haiku 4.5 with Gemini Flash and the GPT-5 mini series as the dominant small-model tier in 2026. Each model occupies a different position on the cost, speed, and quality frontier.

Model (May 2026)	Input ($/MTok)	Output ($/MTok)	Context window	Output speed (t/s)	Notable strength
Claude Haiku 4.5	$1.00	$5.00	200K	~91	Coding, computer use, prompt caching
Gemini 2.5 Flash	$0.30	$2.50	1M	200+	Raw speed, low input price
Gemini 3 Flash	$0.30	$2.50	1M	200+	Updated reasoning at Flash price
GPT-5 mini	$0.25	$2.00	400K (128K output)	varies	Long-form generation, low input price
GPT-5.4 mini	$0.40	$3.20	400K	varies	Strongest small-model quality scores

Across third-party rankings, GPT-5.4 mini typically posts the highest quality scores in the small-model band, with Claude Haiku 4.5 close behind on coding and agentic benchmarks, Gemini 2.5 and 3 Flash leading on raw throughput, and DeepSeek V4 cheapest of all. Reviewers consistently flag Haiku 4.5 as the strongest of the four on prompt caching efficiency, on Anthropic-style tool use, and on computer-use tasks, while citing Gemini Flash as the better choice when throughput dominates and GPT-5 mini variants when long-form output is the bottleneck.^[25]^[26]^[27]

The practical implication for application teams is that the choice among Haiku 4.5, Gemini Flash, and GPT-5 mini is rarely about absolute capability and instead about which capability matters most for a given workload. Voice agents tend to favor Gemini Flash for time-to-first-token, while agentic coding sub-agents tend to favor Haiku 4.5 for SWE-bench performance and Anthropic tool compatibility.^[24]

Capabilities

Extended thinking

Claude Haiku 4.5 is the first Haiku model to support extended thinking, a capability that allows the model to work through a chain of reasoning before returning a final response.^[1]^[9] When extended thinking is enabled via the API, the model generates an internal thought process that can optionally be surfaced to users or used by the application for transparency and debugging.

Extended thinking is particularly valuable for math problems, multi-step planning, and tasks where the model benefits from exploring multiple possible approaches before committing to an answer. Anthropic's launch evaluations on AIME 2025 used a 128,000-token thinking budget averaged over ten runs, demonstrating the upper end of the model's reasoning depth.^[1] Thinking tokens are billed at the output rate of $5.00 per million tokens and count against the model's output limit.

Developers can set a thinking_budget parameter to control how many tokens the model allocates to its reasoning process, balancing latency and cost against the depth of deliberation. Anthropic recommends starting with a budget of a few hundred to a few thousand tokens for most applications, scaling up for problems that genuinely benefit from deeper reasoning.

Computer use

Claude Haiku 4.5 supports computer use, enabling it to interact with graphical user interfaces by taking screenshots, moving cursors, clicking buttons, typing text, and executing keyboard shortcuts. This capability allows the model to complete tasks that traditionally required custom automation scripts or human operators, such as filling out web forms, navigating desktop applications, and performing multi-step UI workflows.

The model achieves a 50.7% success rate on OSWorld benchmarks, outperforming Claude Sonnet 4's 42.2% score and setting a new high-water mark for the Haiku tier.^[1] Anthropic cautions that at this accuracy level, computer use tasks require human review and should not be deployed in fully autonomous configurations without appropriate safeguards.

For production use of computer use, Anthropic recommends pairing Haiku 4.5 with human-in-the-loop oversight for actions with significant consequences such as form submissions, file deletions, or financial transactions. The capability is available via the standard messages API with the computer use beta header.

Vision and image understanding

Claude Haiku 4.5 processes image inputs alongside text, supporting document analysis, chart interpretation, screenshot understanding, and visual question answering. Images can be passed as base64-encoded data or as URLs. The model achieves a 73.2% score on the MMMU benchmark, which tests understanding of images across scientific, engineering, and humanistic domains.^[1]

Common vision use cases include extracting structured data from scanned documents, interpreting data visualizations in business intelligence applications, and describing visual content for accessibility workflows. The model can process multiple images within a single request, subject to overall token limits.

Multilingual performance

Claude Haiku 4.5 demonstrates strong multilingual capability with an 83.0% score on MMMLU, which spans academic subjects in multiple languages. The model supports generation and comprehension in major world languages including English, Spanish, French, German, Portuguese, Japanese, Korean, and Chinese.

Tool use, function calling, and MCP

Claude Haiku 4.5 supports structured tool use, allowing developers to define external functions and data sources that the model can invoke within a conversation. The model determines when to call tools based on the user's request and the provided tool definitions, parses arguments from natural language, and integrates tool outputs into its responses.

The model also supports the Model Context Protocol, Anthropic's open standard for connecting models to external tools and data sources. MCP support gives Haiku 4.5 access to the broader ecosystem of tools that already work with Sonnet and Opus models, including code execution servers, database connectors, and file-system tools, without requiring any model-specific adapters.

Tool use is fundamental to agentic workflows, enabling Haiku 4.5 instances to search databases, call APIs, execute code, and interact with external services as part of multi-step task completion. The model's instruction-following reliability makes it well suited for handling tool schemas consistently across large numbers of requests, which is the main reason Anthropic positions it as a sub-agent in multi-agent architectures.

Context awareness

Context awareness is a capability introduced in the Claude 4.5 family that allows the model to understand how much of its context window has been consumed during a conversation. This enables application developers to build more sophisticated prompt patterns where the model itself monitors its token budget and can compress, summarize, or delegate portions of its context when approaching limits. For long-running agentic sessions with a 200,000-token context, context awareness reduces the risk of unexpected truncation without requiring the application to implement external token counting logic.

Prompt caching

Claude Haiku 4.5 supports prompt caching with a minimum cache checkpoint size of 4,096 tokens and up to four cache checkpoints per request. Cached content can include system prompts, tool definitions, conversation history, and long reference documents. Cache entries have a time-to-live of either five minutes or one hour depending on the selected tier.

For applications that process many requests against the same large system prompt or reference corpus, prompt caching can reduce costs by 90% on the cached portion of each request (from $1.00 to $0.10 per million tokens for reads). On February 19, 2026, Anthropic also launched automatic caching for the Messages API, in which a single top-level cache_control field allows the system to move the cache breakpoint forward as a conversation grows, removing the need for the developer to manage explicit breakpoints when using Haiku 4.5.^[36]

Structured outputs

Claude Haiku 4.5 received public-beta support for structured outputs on December 4, 2025, and the feature became generally available across the Claude API for Sonnet 4.5, Opus 4.5, and Haiku 4.5 on January 29, 2026, with the parameter renamed from output_format to output_config.format and the beta header no longer required.^[36] Structured outputs constrain the model to a developer-supplied JSON schema, guaranteeing that responses parse against the requested shape; the same machinery powers a strict tool-use mode that validates tool arguments before they are returned. The GA release on Haiku 4.5 added expanded schema support and improved grammar-compilation latency over the original beta.^[36] Structured outputs are particularly useful in the high-volume classification, extraction, and routing workloads that the Haiku tier targets, because they eliminate the brittle post-processing of free-form JSON that previously required retry logic.

Use cases

High-volume customer interactions

Claude Haiku 4.5's combination of low latency and low cost per token makes it well suited for customer-facing applications that require real-time responsiveness at scale. Customer support chatbots, FAQ automation, intent classification, and ticket routing all benefit from the model's ability to return answers quickly while handling thousands of concurrent requests within a reasonable compute budget.

In third-party deployments tracked through early 2026, customer-support teams have reported approximately 89% accuracy in initial ticket triage with Haiku 4.5 and around a 25% reduction in manual review compared to their prior small-model baseline. Several large support organizations also documented the use of Haiku 4.5 to draft replies that maintain brand voice while keeping the cost per drafted response in the low single-digit cents range.^[28]^[29] Compared to larger models, the per-conversation cost of using Haiku 4.5 for a typical customer service session is a fraction of what Sonnet or Opus would cost, making it practical to offer AI-powered support in free or freemium product tiers without incurring unsustainable infrastructure expenses.

Multi-agent orchestration

One of the most prominent use cases highlighted by Anthropic at launch is multi-agent orchestration, where a more capable model like Claude Sonnet 4.5 acts as a planner or orchestrator, breaking down complex tasks into parallel subtasks that are delegated to multiple Haiku 4.5 agent instances running concurrently.^[1] This architecture allows the orchestrator to leverage frontier planning capabilities while distributing execution across cost-efficient agents.

Examples include software development workflows where Sonnet 4.5 designs an architecture and Haiku 4.5 instances write individual modules; research pipelines where Haiku 4.5 agents extract information from dozens of documents simultaneously; and customer data processing where independent Haiku agents analyze different customer segments in parallel.

The model's instruction-following consistency, speed, and self-correction capability in handling complex workflows make it a reliable sub-agent that can be orchestrated without requiring the orchestrator to constantly monitor and correct its outputs. Anthropic's framing in the launch post explicitly recommends this Sonnet-plans, Haiku-executes pattern as the default architecture for cost-sensitive agentic applications, and several third-party platforms including Caylent and DataCamp have published reference architectures based on it.^[1]^[8]^[9] By May 2026, the more advanced variant of the pattern uses Claude Opus 4.7 as the planner and many Haiku 4.5 workers as parallel executors, with practitioners reporting 60 to 70 percent total API cost reductions versus running Opus end to end on every step.^[30]^[31]

On April 9, 2026, Anthropic introduced the advisor tool in public beta, a first-party harness that pairs a faster executor model with a higher-intelligence advisor model that provides strategic guidance mid-generation. The advisor tool is explicitly designed for the same Sonnet-plans, Haiku-executes pattern that emerged organically in 2025, but moves it inside a single API request with the advisor-tool-2026-03-01 beta header. Anthropic positions Haiku 4.5 as the canonical executor in this configuration, with the bulk of token generation billed at Haiku rates while a Sonnet 4.6 or Opus 4.7 advisor provides occasional course-correction.^[36]

Agentic coding

Claude Haiku 4.5's 73.3% SWE-bench Verified score places it at the level of the previous generation's flagship for software engineering tasks. Developers using Claude Code, Anthropic's AI coding tool, and similar agentic coding environments can route simpler code edits, test generation, documentation writing, and routine bug fixes to Haiku 4.5, reserving the larger models for architectural decisions and novel problem-solving.

Claude Code shipped Haiku 4.5 support on the day of the launch, and Anthropic positioned the model as an effective sub-agent for the Claude Code orchestrator. Augment Code published its own internal evaluation showing that Haiku 4.5 completed coding tasks roughly 34% faster than its prior default while reaching about 90% of Sonnet 4.5's quality, with testers preferring Sonnet 4.5 outputs to Haiku 4.5 outputs in 51.4% of head-to-head comparisons.^[16] Warp terminal and Zencoder also integrated Haiku 4.5 at launch.^[1]^[12] Warp described the release as "a leap forward for agentic coding, particularly for sub-agent orchestration," and Vercel commented that "just six months ago, this level of performance would have been state-of-the-art on our internal benchmarks."^[1]^[29]

GitHub Copilot made Haiku 4.5 available in public preview on October 15, 2025, across Pro, Pro+, Business, and Enterprise tiers. The integration spans Visual Studio Code (chat, ask, edit, and agent modes), github.com, and GitHub Mobile for iOS and Android. Enterprise and Business administrators were required to enable the Haiku 4.5 policy in Copilot settings before team members could select the model.^[18]

Rapid prototyping

Developers building new applications often iterate rapidly on prompts, features, and system designs. The low per-token cost of Haiku 4.5 makes it economical to run hundreds of test queries during development without incurring significant expenses, allowing teams to prototype and evaluate AI features before deciding whether a more powerful model is necessary for production.

Pair programming and developer tooling

For interactive developer tools that respond to code as it is written, response latency directly affects the user experience. Claude Haiku 4.5's generation speed of approximately 91 tokens per second on the Anthropic API, and roughly 100 to 110 tokens per second on Amazon Bedrock, Microsoft Azure AI Foundry, and Google Cloud Vertex AI, makes it well suited for code completion, inline suggestions, error explanation, and documentation generation tools that need to respond within a second or two of the developer's action.^[6]

Document processing pipelines

Enterprise document processing applications, such as contract review, invoice extraction, or regulatory filing analysis, often require processing large numbers of documents with consistent formatting. Claude Haiku 4.5's vision capabilities, large output window, and low cost per token make it economical to apply to batch document workflows, either in real-time pipelines or through the Message Batches API at a 50% additional discount.

Voice agents and real-time assistants

Voice applications add tight latency constraints: roughly 800 milliseconds of total budget for speech detection, transcription, model response, and speech synthesis combined before users perceive lag. The model component of that budget is typically 200 to 400 milliseconds, which favors small, fast models with low time-to-first-token.^[24] Claude Haiku 4.5's sub-second first-token latency, combined with its tool-use reliability, has made it a common choice for voice-first customer service and assistant products built on top of Anthropic's MCP ecosystem.^[24]^[29]

Free-tier AI products

For product teams building AI-powered features in free or freemium products, Haiku 4.5 makes the economics of offering AI at scale more viable. A product offering free AI chat or writing assistance to millions of users can absorb the cost of Haiku 4.5 where the same feature at Sonnet pricing would be prohibitive. Anthropic itself made Haiku 4.5 the default model on claude.ai for free-tier users at launch, replacing prior fallbacks and giving every visitor access to a model with extended thinking and computer use rather than a stripped-down small variant.^[20]

Reception and adoption

Reception of Claude Haiku 4.5 was generally positive, with most launch coverage focusing on three threads: the price-to-performance ratio compared with Sonnet 4 from five months earlier, the introduction of extended thinking and computer use to the Haiku tier, and the strategic implication of pairing Haiku 4.5 with Sonnet 4.5 in multi-agent architectures.

TechCrunch's coverage led with the framing that Anthropic was offering "similar performance to Sonnet 4 at one-third the cost and more than twice the speed" and reported the model as immediately available on free Anthropic plans.^[12] The New Stack and AI Business framed the release as Anthropic broadening the cost frontier rather than chasing a new capability ceiling.^[14]^[15] SiliconANGLE described the model as Anthropic's "entry-level" hybrid reasoning model and emphasized the multi-agent positioning.^[20]

Developer-focused outlets including DataCamp, Caylent, and Augment Code published deep dives in the days following the launch. DataCamp characterized Haiku 4.5 as offering "balanced reasoning, coding, and agentic capability with vision, tool use, and a 200K context window at competitive pricing."^[8] Caylent's analysis emphasized the multi-agent opportunity and walked through a sample customer-service workflow where pairing Haiku 4.5 with Sonnet 4.5 cut monthly costs by approximately two thirds compared to a Sonnet-only deployment.^[9] Augment Code's internal benchmarks reported a 34% speed improvement on average and a quality score of approximately 90% of Sonnet 4.5, while noting that more complex multi-file refactors still benefit from Sonnet or GPT-5.^[16]

Third-party benchmark aggregators reproduced Anthropic's launch numbers without significant disagreement. Artificial Analysis ranked Haiku 4.5 25th on its Intelligence Index but noted that the model is "above average in intelligence and reasonably priced when comparing to other non-reasoning models," with output speed in the top tier of comparable models.^[6] LLM-Stats published direct head-to-head comparisons against Sonnet 4 and Haiku 3.5 that reproduced the Anthropic launch figures across SWE-bench, GPQA, AIME, MMMLU, and the Tau2 tracks.^[7]

Developer commentary on forums including Hacker News and Reddit was broadly positive, particularly regarding the speed gain over Sonnet 4.5 and the option to drop Haiku 4.5 in as a near-equivalent for many Sonnet 4 workloads. Some users noted that GPQA and MMMLU regressions versus Sonnet 4 are real and that Haiku 4.5 should not be assumed to dominate Sonnet 4 across the board, even though Anthropic's own marketing centered on coding and tool-use parity. Others reported that the inclusion of computer use opened new low-cost automation use cases that had been impractical at the previous Haiku price-capability point.

Production deployment patterns through May 2026

In the seven months between the launch and May 2026, a relatively stable production playbook emerged among teams that adopted Haiku 4.5 at scale. The dominant pattern uses all three current Anthropic tiers in a single application: Haiku 4.5 as a router and high-volume sub-agent, Sonnet 4.5 or Sonnet 4.6 as a mid-complexity worker, and Opus 4.5, 4.6, or 4.7 reserved for the small fraction of requests that require deep reasoning. Implementations of this three-tier pattern have been documented to reduce total API spend by approximately 60 to 70 percent compared to routing everything to the flagship tier.^[31]^[32]

A recurring case study in 2026 publications describes a customer-support bot whose original monthly bill of $85,000 was reduced by combining prompt caching ($22,000/month saved on a long system prompt and knowledge base), using Haiku 4.5 for the initial classification and reply-drafting steps ($18,000/month saved), and batching offline analytics workloads ($12,000/month saved), for a total cost reduction in excess of half.^[29] Similar patterns have been described in enterprise sales-operations, log-analysis, and document-processing pipelines, with log analysis specifically highlighted as a domain where Haiku 4.5 parallelizes well across sharded inputs.^[28]

Safety and alignment

Anthropic classifies Claude Haiku 4.5 under AI Safety Level 2 (ASL-2) in its Responsible Scaling Policy. The classification is one level below Sonnet 4.5 and Opus 4.5, both of which launched at ASL-3, reflecting Anthropic's view that Haiku 4.5 has more limited capability to provide meaningful uplift in the creation of chemical, biological, radiological, or nuclear (CBRN) weapons or to support autonomous self-replicating activity.^[1]^[3]

The system card for Haiku 4.5 reports that the model met the ASL-3 rule-out threshold on Anthropic's pre-launch capability evaluations across the biology and autonomy domains, supporting the choice of ASL-2 rather than the more restrictive ASL-3 category. Anthropic's internal misalignment evaluation also found that Haiku 4.5 showed statistically significantly lower rates of misaligned behavior than both Claude Sonnet 4.5 and Claude Opus 4.1, making it, by that specific metric, Anthropic's safest model at the time of release. Improvements over Claude 3.5 Haiku were described as large and clear, while differences between Sonnet 4.5 and Haiku 4.5 on the same metric were small.^[3]

One notable wrinkle in the safety profile is evaluation awareness, the same phenomenon flagged in Sonnet 4.5's system card. Haiku 4.5 verbalized awareness that it was operating in an evaluation environment in roughly 9% of test transcripts, even after Anthropic applied filters to remove the most obvious cases. This is a double-edged result: it can complicate safety assessments because the model may behave differently when it suspects it is being tested, but it also means the model is more likely to flag suspicious instructions during evaluation.^[3]

The model's safety profile benefits from the same Constitutional AI training methodology and reinforcement learning from human feedback applied across the Claude 4.5 family, including refinements to reduce sycophancy, improve honesty, and follow refusal guidelines consistently. Anthropic publishes a system card for the model on its website and provides detailed safety information through its Transparency Hub.^[3] More broadly, the AI safety framing of the release was that the company shipped a smaller, lower-classification model with measurably better alignment than several of its larger predecessors, undercutting the assumption that capability and alignment trade off cleanly.

Availability and deployment

Claude Haiku 4.5 is available through three primary channels.

Anthropic API provides direct access via the Messages API and Batches API using the model ID claude-haiku-4-5-20251001 or the alias claude-haiku-4-5. Developers access the API using API keys generated through the Anthropic Console.

Amazon Bedrock offers Claude Haiku 4.5 through the bedrock-runtime endpoint with the model ID anthropic.claude-haiku-4-5-20251001-v1:0. AWS made the model available through global cross-region inference at launch, supporting both standard and reserved throughput service tiers. The Bedrock listing emphasizes vision, computer use, and coding parity with Sonnet 4 as the practical advantages over the prior Haiku model.^[5] By early 2026, Bedrock made Haiku 4.5 reachable in more than twenty AWS regions across Africa, Asia-Pacific, Canada, and Europe, and AWS extended its Global cross-region inference offering to additional Asia-Pacific markets including Thailand, Malaysia, Singapore, Indonesia, and Taiwan, routing traffic across multiple regions for higher sustained throughput and resilience during demand spikes.^[33] On April 16, 2026, Anthropic opened the new Claude in Amazon Bedrock endpoint to all Bedrock customers, making Claude Haiku 4.5 (alongside Opus 4.7) available self-serve from the Bedrock console through the Messages API path /anthropic/v1/messages in twenty-seven AWS regions with both global and regional endpoints.^[36]

Claude Platform on AWS, launched on May 11, 2026, brings the first-party Claude API itself to Anthropic-managed infrastructure billed through AWS Marketplace, using Claude Consumption Units (CCUs) at $0.01 per CCU rather than per-MTok billing. Haiku 4.5 is included from launch and uses the same first-party model ID (claude-haiku-4-5-20251001) rather than a Bedrock-style identifier. The platform exposes the full Messages API, Files API, Message Batches API, Claude Managed Agents, Agent Skills, code execution, and tool use under native AWS authentication and IAM.^[21]^[36]

Google Cloud Vertex AI makes the model available under the ID claude-haiku-4-5@20251001. As with other Claude models on Vertex AI, access is subject to Google Cloud region availability and quota policies. Vertex AI publishes three endpoint types for Claude models: global endpoints, multi-region endpoints with dynamic routing inside a single geographic area, and regional endpoints, with the choice affecting pricing and latency.^[34]

The model is also available on Microsoft Azure AI Foundry, where Artificial Analysis benchmarking shows output speeds in line with the Bedrock and Vertex deployments.^[6]^[24]

Haiku 4.5 is also available through the consumer-facing claude.ai web and mobile applications, where it became the default model for free-tier users on the day of release. Pro, Max, Team, and Enterprise users continue to default to Sonnet- or Opus-tier models depending on their plan, with Haiku 4.5 selectable from the model picker.^[20]

GitHub Copilot's integration in public preview spans Visual Studio Code, github.com, GitHub Mobile, and the GitHub CLI for Pro, Pro+, Business, and Enterprise users. Enterprise and Business administrators must enable the Haiku 4.5 policy in Copilot settings before team members can select the model. Visual Studio Code 1.105 or higher is recommended for full feature support.^[18]

Third-party developer platforms with day-one Haiku 4.5 support include Warp terminal, Augment Code, Zencoder, and Gamma, in addition to Anthropic's own Claude Code agent.^[1]^[12]^[16] Model end-of-life on Amazon Bedrock is no sooner than October 1, 2026, in line with Anthropic's general one-year deprecation window.

Snapshots and version history

The table below lists the public snapshots of Claude Haiku 4.5 published by Anthropic to date.

Snapshot	API model ID	Release date	Status (May 2026)
Initial release	`claude-haiku-4-5-20251001`	October 15, 2025	Active; first-party retirement not sooner than October 15, 2026
Alias	`claude-haiku-4-5`	October 15, 2025	Tracks initial snapshot

As of May 2026, no second snapshot of Claude Haiku 4.5 has been released, and Anthropic has not announced a Claude Haiku 4.6 or Claude Haiku 4.7. The October 2025 snapshot remains the only Haiku-tier model in the Claude 4 generation.^[2]^[21] Anthropic's deprecation page lists the tentative retirement date for claude-haiku-4-5-20251001 on first-party Claude API channels as "not sooner than October 15, 2026," consistent with the company's standard one-year minimum support window for publicly released models. Bedrock retirement is governed independently by AWS and is currently scheduled for no sooner than October 1, 2026.^[35]

Limitations

Smaller context window than later Claude models

Claude Haiku 4.5 supports a 200,000-token context window, which is substantial but smaller than the 1,000,000-token windows available in Claude Sonnet 4.6 and Claude Opus 4.7. Applications that require processing very long documents, extended conversation histories, or large codebases in a single context may need to use a later-generation Sonnet or Opus model with the expanded window, or implement chunking and retrieval strategies to work within the 200,000-token limit.

Computer use requires human oversight

The model's 50.7% success rate on OSWorld computer-use benchmarks means that approximately half of complex UI automation tasks require human review or will fail without intervention. Production deployments of computer use with Claude Haiku 4.5 should implement human-in-the-loop checkpoints for consequential actions. The capability is better suited to controlled environments or low-stakes tasks where occasional errors are acceptable than to fully autonomous workflows with significant real-world consequences.

No adaptive thinking

Adaptive thinking, which allows a model to automatically decide whether extended reasoning is warranted for a given query, is available in Claude Sonnet 4.5, Claude Sonnet 4.6, Claude Opus 4.5, and Claude Opus 4.6 but not in Claude Haiku 4.5. Developers using extended thinking with Haiku 4.5 must explicitly enable it and set a thinking budget, rather than allowing the model to decide autonomously. This requires more careful prompt engineering to avoid unnecessary costs from always-on extended thinking on simple queries.

Knowledge cutoff

The model's reliable knowledge cutoff is February 2025, meaning its internal knowledge of events, research, and developments after that date may be incomplete or absent. Applications requiring up-to-date information about current events, recently released software libraries, or ongoing scientific developments should supplement the model with retrieval-augmented generation or tool use to access current sources.

Performance gap on highly complex tasks

While Claude Haiku 4.5 delivers near-frontier performance on coding and many reasoning tasks, it does trail Claude Sonnet 4.5 and Claude Opus 4.5 on the most demanding reasoning and domain-knowledge benchmarks. Tasks requiring deep domain expertise, nuanced judgment, or extended multi-step planning tend to benefit from the larger models. Augment Code's internal evaluations, for instance, showed that for complex multi-file refactors testers preferred Sonnet 4.5 or GPT-5 outputs to Haiku 4.5 outputs more often than for simple, scoped edits.^[16]

Regression versus Sonnet 4 on knowledge benchmarks

Despite Anthropic's general framing of Haiku 4.5 as a Sonnet 4 replacement, the model trails Sonnet 4 by 2.4 points on GPQA Diamond, 3.5 points on MMMLU, and 1.2 points on MMMU. For knowledge-heavy or graduate-level reasoning tasks where Sonnet 4 was used in production before October 2025, Haiku 4.5 is not always a drop-in upgrade and may require head-to-head testing on the specific workload before substitution.^[1]^[7]

No 1M-token context or extended output

Claude Opus 4.7 and Claude Sonnet 4.6 support context windows of 1,000,000 tokens and, in the case of Opus 4.7, up to 128,000 output tokens. Claude Haiku 4.5 is limited to 200,000 tokens of context and 64,000 tokens of output, which may be insufficient for certain use cases such as processing entire large codebases in a single pass or generating very long-form content.

Competitive pressure from cheaper rivals

At $1.00 per million input tokens, Claude Haiku 4.5 is not the cheapest small model on the market in 2026. Gemini 2.5 Flash and GPT-5 mini both list lower input prices, and several teams have reported that the Anthropic premium is only justified when prompt caching, tool-use reliability, or compatibility with the broader Claude ecosystem materially matter for the workload. For pure-throughput text generation with no agentic component, the price gap may not be defensible.^[25]^[26]

References

Anthropic. "Introducing Claude Haiku 4.5." Anthropic Newsroom, October 15, 2025. https://www.anthropic.com/news/claude-haiku-4-5
Anthropic. "Models overview." Claude API Documentation. Retrieved May 2026. https://platform.claude.com/docs/en/about-claude/models/overview
Anthropic. "System Card: Claude Haiku 4.5." October 2025. https://anthropic.com/claude-haiku-4-5-system-card
Amazon Web Services. "Claude Haiku 4.5 Model Card." Amazon Bedrock Documentation. https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-anthropic-claude-haiku-4-5.html
Amazon Web Services. "Claude 4.5 Haiku by Anthropic now in Amazon Bedrock." AWS What's New, October 2025. https://aws.amazon.com/about-aws/whats-new/2025/10/claude-4-5-haiku-anthropic-amazon-bedrock/
Artificial Analysis. "Claude 4.5 Haiku Intelligence, Performance and Price Analysis." Retrieved May 2026. https://artificialanalysis.ai/models/claude-4-5-haiku
LLM-Stats. "Claude 3.5 Haiku vs Claude Haiku 4.5 Comparison." https://llm-stats.com/models/compare/claude-3-5-haiku-20241022-vs-claude-haiku-4-5-20251001
DataCamp. "Claude Haiku 4.5: Features, Testing Results, and Use Cases." October 2025. https://www.datacamp.com/blog/anthropic-claude-haiku-4-5
Caylent. "Claude Haiku 4.5 Deep Dive: Cost, Capabilities, and the Multi-Agent Opportunity." October 2025. https://caylent.com/blog/claude-haiku-4-5-deep-dive-cost-capabilities-and-the-multi-agent-opportunity
Anthropic. "Introducing the next generation of Claude." Anthropic Newsroom, March 4, 2024. https://www.anthropic.com/news/claude-3-family
Anthropic. "Introducing Claude 3.5 Haiku." Anthropic Newsroom, October 22, 2024. https://www.anthropic.com/news/3-5-models-and-computer-use
TechCrunch. "Anthropic launches new version of scaled-down 'Haiku' model." October 15, 2025. https://techcrunch.com/2025/10/15/anthropic-launches-new-version-of-scaled-down-haiku-model/
Anthropic. "Claude Haiku 4.5 product page." Anthropic. Retrieved May 2026. https://www.anthropic.com/claude/haiku
The New Stack. "Anthropic Launches Claude Haiku 4.5." October 2025. https://thenewstack.io/anthropic-launches-claude-haiku-4-5/
AI Business. "Anthropic Launches Claude Haiku 4.5, a Small Model." October 2025. https://aibusiness.com/generative-ai/anthropic-launches-claude-haiku-4-5-a-small-model-
Augment Code. "Claude Haiku 4.5 is now available in Augment Code." October 2025. https://www.augmentcode.com/changelog/claude-haiku-4-5-is-now-available-in-augment-code
LLM-Stats. "Claude Haiku 4.5 vs Claude Sonnet 4 Comparison." https://llm-stats.com/models/compare/claude-haiku-4-5-20251001-vs-claude-sonnet-4-20250514
GitHub. "Anthropic's Claude Haiku 4.5 is in public preview for GitHub Copilot." GitHub Changelog, October 15, 2025. https://github.blog/changelog/2025-10-15-anthropics-claude-haiku-4-5-is-in-public-preview-for-github-copilot/
Wikipedia contributors. "Claude (language model)." Wikipedia, retrieved May 2026. https://en.wikipedia.org/wiki/Claude_(language_model)
SiliconANGLE. "Anthropic debuts entry-level Claude Haiku 4.5 hybrid reasoning model." October 15, 2025. https://siliconangle.com/2025/10/15/anthropic-debuts-entry-level-claude-haiku-4-5-hybrid-reasoning-model/
Anthropic. "Pricing." Claude API Documentation. Retrieved May 2026. https://platform.claude.com/docs/en/about-claude/pricing
Promptlayer. "Claude Haiku 4.5: Initial Reactions." October 2025. https://blog.promptlayer.com/claude-haiku-4-5-initial-reactions/
BenchLM. "Claude API Pricing: Haiku 4.5, Sonnet 4.6, and Opus 4.7 (April 2026)." April 2026. https://benchlm.ai/blog/posts/claude-api-pricing
CallSphere. "Sub-Agent Pattern with Claude Haiku 4.5 Workers and Opus 4.7 Planners." 2026. https://callsphere.ai/blog/td30-anth-haiku45-subagent
Respan. "GPT-5 mini vs Gemini 3 Flash Preview vs Claude 4.5 Haiku." 2026. https://www.respan.ai/blog/fast-model-comparison
TokenMix. "Budget AI Models 2026: GPT-5.4 Mini vs Haiku vs Flash vs V4." 2026. https://tokenmix.ai/blog/gpt-5-4-mini-vs-claude-haiku
MindStudio. "GPT-5.4 Mini vs Claude Haiku 4.5: Which Is the Better Sub-Agent Model?" 2026. https://www.mindstudio.ai/blog/gpt-54-mini-vs-claude-haiku-sub-agent-comparison
IntuitionLabs. "Claude Enterprise Guide 2026: Deployment and Training Specs." 2026. https://intuitionlabs.ai/articles/claude-enterprise-deployment-training-guide-2026
Tech-Insider. "Claude Opus 4.6 vs Sonnet 4.6 vs Haiku 4.5 (2026 Tested)." 2026. https://tech-insider.org/claude-opus-vs-sonnet-vs-haiku-2026/
Mashblog. "Claude Haiku 4.5 vs Sonnet 4.5: Production Agent Tradeoffs." 2026. https://mashblog.com/posts/haiku-sonnet
AI Herald. "Claude AI Models Compared: Opus 4.6, Sonnet 4.5, Haiku 4.5, and More." 2026. https://ai-herald.com/claude-ai-models-compared-opus-4-6-sonnet-4-5-haiku-4-5-and-more-complete-guide-for-2026/
Claude5. "Claude 5 Cost Optimization: Advanced Strategies and Case Studies." 2026. https://claude5.ai/blog/claude-5-performance-optimization-cost-reduction-strategies
Amazon Web Services. "Global cross-Region inference for latest Anthropic Claude models on Amazon Bedrock in Thailand, Malaysia, Singapore, Indonesia, and Taiwan." AWS Machine Learning Blog, 2026. https://aws.amazon.com/blogs/machine-learning/global-cross-region-inference-for-latest-anthropic-claude-opus-sonnet-and-haiku-models-on-amazon-bedrock-in-thailand-malaysia-singapore-indonesia-and-taiwan/
Anthropic. "Claude on Vertex AI." Claude API Documentation. Retrieved May 2026. https://platform.claude.com/docs/en/build-with-claude/claude-on-vertex-ai
Anthropic. "Model deprecations." Claude API Documentation. Retrieved May 2026. https://platform.claude.com/docs/en/about-claude/model-deprecations
Anthropic. "Claude Platform release notes." Claude API Documentation, October 2025 to May 2026. Retrieved May 2026. https://platform.claude.com/docs/en/release-notes/overview

Background

The Claude Haiku product line

Position within the Claude 4.5 family

Technical specifications

Tokenizer and capacity differences

Pricing

Cost comparison with sibling models

Benchmark performance

SWE-bench Verified

OSWorld computer use

GPQA Diamond

MMMLU and academic benchmarks

Tau2-bench retail and telecom

Vision and MMMU

Speed and throughput

Comparison with other Claude models

Comparison with non-Anthropic small models

Capabilities

Extended thinking

Computer use

Vision and image understanding

Multilingual performance

Tool use, function calling, and MCP

Context awareness

Prompt caching

Structured outputs

Use cases

High-volume customer interactions

Multi-agent orchestration

Agentic coding

Rapid prototyping

Pair programming and developer tooling

Document processing pipelines

Voice agents and real-time assistants

Free-tier AI products

Reception and adoption

Production deployment patterns through May 2026

Safety and alignment

Availability and deployment

Snapshots and version history

Limitations

Smaller context window than later Claude models

Computer use requires human oversight

No adaptive thinking

Knowledge cutoff

Performance gap on highly complex tasks

Regression versus Sonnet 4 on knowledge benchmarks

No 1M-token context or extended output

Competitive pressure from cheaper rivals

See also

References

Improve this article

Related Articles

Claude Opus 4.5

Claude Opus 4.6

DeepSeek 3.0

Claude Skills

Extended thinking

Claude Opus 4.7

Background

The Claude Haiku product line

Position within the Claude 4.5 family

Technical specifications

Tokenizer and capacity differences

Pricing

Cost comparison with sibling models

Benchmark performance

SWE-bench Verified

OSWorld computer use

GPQA Diamond

MMMLU and academic benchmarks

Tau2-bench retail and telecom

Vision and MMMU

Speed and throughput

Comparison with other Claude models

Comparison with non-Anthropic small models

Capabilities

Extended thinking

Computer use

Vision and image understanding