Adaptive thinking
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 3,860 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 3,860 words
Add missing citations, update stale details, or suggest a clearer explanation.
Adaptive thinking is an inference-time reasoning configuration in the Anthropic Messages API that lets a claude model decide on a per-request basis whether to use extended thinking and how much of it to allocate, rather than requiring the developer to specify a fixed token budget up front.[^1] The feature is activated by passing thinking: {type: "adaptive"} and is paired with a separate effort parameter (low, medium, high, xhigh, or max) that provides soft guidance on reasoning depth and overall token spend.[^1][^2] Adaptive thinking was introduced with claude opus 4 6 on 5 February 2026, recommended on claude sonnet 4 6 from 17 February 2026, and became the only supported thinking mode on claude opus 4 7 when that model shipped on 16 April 2026, with manual budget-based thinking returning a 400 error.[^1][^3][^4][^5] It replaces the older thinking: {type: "enabled", budget_tokens: N} configuration that originated with Claude 3.7 Sonnet in February 2025 and remained the standard through claude opus 4 5.[^6][^7] Adaptive thinking automatically enables interleaved reasoning between tool calls without requiring the previously needed interleaved-thinking-2025-05-14 beta header, which makes it the default mechanism for agentic workflows built on the anthropic api.[^1][^8]
Extended thinking, the underlying capability that adaptive thinking exposes more flexibly, was first shipped to Anthropic customers on 25 February 2025 with Claude 3.7 Sonnet.[^6] In that release the developer could set thinking: {type: "enabled", budget_tokens: N}, where N ranged from a minimum of 1,024 tokens up to the model's 128,000-token output ceiling, and Claude would spend roughly that many tokens reasoning inside <thinking> blocks before producing the user-visible answer.[^6][^7] Pricing was unchanged from non-thinking inference, with thinking tokens billed at the standard output rate.[^6] This same budget-based syntax carried through subsequent Claude 4 releases, including claude opus 4 5 (24 November 2025), claude haiku 4 5 (15 October 2025), and earlier Sonnet 4.x snapshots.[^4][^9][^10]
The fixed-budget design forced two awkward decisions on every API call. First, the developer had to predict in advance whether a given user message warranted any thinking at all, because thinking added latency and cost regardless of whether the question was a one-line factoid or a multi-step proof. Second, when thinking was enabled, the developer had to guess at the right ceiling: too low meant the model truncated reasoning mid-chain and gave a worse answer, while too high meant paying for unused capacity on simple inputs. Anthropic's documentation acknowledges this tradeoff and notes that adaptive thinking "reliably drives better performance than extended thinking with a fixed budget_tokens" on workloads where request complexity varies widely.[^1][^11] Third-party analyses report that production agent pipelines moving from fixed-budget thinking to adaptive mode see roughly 40 to 60 percent total cost reductions on workloads where simple subtasks dominate, because adaptive Claude can skip thinking entirely on the easy cases while reserving deep reasoning for the difficult ones.[^8]
Adaptive thinking was first announced with the Claude Opus 4.6 launch on 5 February 2026, alongside the public introduction of the effort parameter as a replacement for budget_tokens.[^3][^12] Anthropic positioned the change as part of a broader shift in how reasoning budgets should be controlled: instead of developers routing simple queries to cheaper models and complex queries to larger ones, a single model with effort-level controls handles both ends of the spectrum.[^8] claude sonnet 4 6 inherited the feature on 17 February 2026, and claude opus 4 7 removed manual budget-based thinking entirely on 16 April 2026.[^4][^5]
In adaptive mode the decision of whether to emit a thinking block, and how long that block should be, is made by the model itself on a per-request basis rather than being fixed by the developer.[^1] The Anthropic documentation states: "Claude evaluates the complexity of each request and determines whether and how much to use extended thinking. At the default effort level (high), Claude almost always thinks. At lower effort levels, Claude may skip thinking for simpler problems."[^1]
The wire format is minimal. A request enabling adaptive thinking on Opus 4.7 looks like this:
{
"model": "claude-opus-4-7",
"max_tokens": 16000,
"thinking": {"type": "adaptive"},
"messages": [
{"role": "user", "content": "Explain why the sum of two even numbers is always even."}
]
}
There is no budget_tokens field, and no beta header is required.[^1] Output content blocks of type thinking still appear in the response when Claude does choose to reason, and they stream through the same thinking_delta events as before; the only behavioral difference is that the model may emit zero such blocks for sufficiently simple prompts.[^1]
Effort is the primary developer-facing control over how aggressively adaptive thinking activates. It is passed inside an output_config object (not inside thinking, which raises a ValidationException on Bedrock and a similar error on the first-party API) and accepts five values: low, medium, high, xhigh, and max.[^11][^2]
| Effort level | Thinking behavior | Model availability |
|---|---|---|
max | Claude always thinks with no constraint on depth | Opus 4.7, Opus 4.6, Sonnet 4.6 |
xhigh | Claude always thinks deeply with extended exploration | Opus 4.7 |
high (default) | Claude almost always thinks; deep reasoning on complex tasks | Opus 4.7, Opus 4.6, Sonnet 4.6, Opus 4.5 |
medium | Moderate thinking; may skip for very simple queries | Opus 4.7, Opus 4.6, Sonnet 4.6, Opus 4.5 |
low | Minimizes thinking; skips for simple tasks where speed matters | Opus 4.7, Opus 4.6, Sonnet 4.6, Opus 4.5 |
Anthropic stresses that effort is "a behavioral signal, not a strict token budget."[^2] Setting effort to low does not cap thinking tokens at any specific number; instead, the model is steered to think less often and less deeply, and it will still reason on sufficiently hard problems even at the lowest setting.[^2] Effort also applies to non-thinking output, so a low setting reduces tool-call verbosity, code comment density, and overall response length even on requests where thinking is skipped.[^2]
The xhigh level is new in claude opus 4 7 and sits between high and max. Anthropic recommends it as the default starting point for coding and long-horizon agentic work on Opus 4.7, noting that xhigh expects "meaningfully higher token usage than high" but produces stronger results on tasks involving repeated tool calling, detailed web search, and large knowledge-base lookups.[^2] Claude Code raised its default effort to xhigh across all subscription tiers with the Opus 4.7 launch.[^5]
Independently of whether thinking is adaptive or manual, the display field controls how the thinking content is returned in the API response. It accepts two values:[^1]
display: "summarized" returns a model-generated summary of the full chain of thought inside the thinking field. This is the default on claude opus 4 6, claude sonnet 4 6, and earlier Claude 4 models.[^1]display: "omitted" returns thinking blocks with an empty thinking field. The encrypted full chain of thought still travels in the signature field for multi-turn continuity, but no human-readable reasoning text is surfaced.[^1]On claude opus 4 7 the default flipped silently to display: "omitted".[^1] Anthropic documents this explicitly: "On Claude Opus 4.7, thinking.display defaults to omitted. Thinking blocks still appear in the response stream, but their thinking field is empty unless you explicitly opt in. This is a silent change from Claude Opus 4.6, where the default was summarized."[^1] Restoring visible reasoning on Opus 4.7 requires thinking: {"type": "adaptive", "display": "summarized"} in the request body.[^1] The stated rationale is latency: with thinking output suppressed, the server can begin streaming the final text response sooner, even though billed thinking tokens are unchanged.[^1]
The signature field is identical between the two display modes, which means a multi-turn conversation can mix omitted and summarized turns without breaking thinking-block validation, and switching between display values mid-conversation is supported.[^1]
Interleaved thinking lets Claude produce additional thinking blocks between tool calls inside a single assistant turn, rather than only at the very start of the turn. Without it, an agent's tool-use loop has no intermediate reasoning step: the model picks a tool, reads the result, picks another tool, and so on, without ever pausing to revise its plan.
Interleaved thinking was originally gated behind the interleaved-thinking-2025-05-14 beta header, available on Claude Sonnet 4.5, Opus 4.5, and earlier Claude 4 models.[^7][^11] Adaptive thinking changes this: on claude opus 4 7, claude opus 4 6, and claude sonnet 4 6 in adaptive mode, interleaved thinking is automatically enabled and the beta header is no longer required.[^1] On Opus 4.6 in manual (type: "enabled") mode the header is deprecated and silently ignored; interleaved thinking is unavailable in Opus 4.6 manual mode, which is one of the principal reasons Anthropic urges migration to adaptive for agentic workloads.[^1]
This change matters operationally. For agents built on Claude that use tools heavily, adaptive thinking is now the only way to get inter-tool reasoning on the latest Opus model without the beta header dance, and it is the only thinking mode at all on Opus 4.7.[^1][^5]
Adaptive thinking relaxes one constraint that manual extended thinking enforced. In manual mode the API rejected any thinking-enabled assistant turn that did not begin with a thinking block; the first content block of a turn had to be type: "thinking". Adaptive mode removes this requirement: previous assistant turns are not forced to start with thinking blocks, so a conversation history that mixes thinking and non-thinking turns is valid.[^1]
Adaptive mode also preserves prompt caching breakpoints across consecutive requests that all use adaptive thinking. Switching between adaptive and manual or disabled thinking mid-conversation breaks message-level cache breakpoints, although system prompts and tool definitions remain cached regardless of mode changes.[^1][^11]
The differences between adaptive thinking and the prior manual extended thinking configuration can be summarized in a single table.
| Property | Manual (type: "enabled") | Adaptive (type: "adaptive") |
|---|---|---|
| Configuration field | budget_tokens: N | effort: low/medium/high/xhigh/max |
| Who decides whether to think | Developer (always thinks when enabled) | Model (per-request) |
| Who decides depth | Developer (fixed budget) | Model, guided by effort |
| Available on Opus 4.7 | Rejected with 400 error | Required mode |
| Available on Opus 4.6 / Sonnet 4.6 | Deprecated but functional | Recommended |
| Available on Opus 4.5, Sonnet 4.5, Haiku 4.5 | Required mode | Not supported |
| Interleaved thinking | Beta header on Sonnet 4.6 only; unavailable on Opus 4.6 manual mode | Automatic on Opus 4.6, Sonnet 4.6, Opus 4.7 |
| First turn must begin with thinking block | Yes | No |
| Token cost on simple queries | Pays full budget regardless | May pay zero thinking tokens |
The deprecation trajectory is explicit. Anthropic's documentation states that thinking.type: "enabled" and budget_tokens are deprecated on Opus 4.6 and Sonnet 4.6 and will be removed in a future model release.[^1] claude opus 4 7 enforces this removal: requests using type: "enabled" are rejected with a 400 error rather than silently downgraded, which means client libraries and orchestrators built against the older interface require code changes to call Opus 4.7 at all.[^1][^5]
Existing Sonnet 4.5, Opus 4.5, and Haiku 4.5 endpoints continue to require manual extended thinking with budget_tokens and do not accept the adaptive type at all.[^1][^11] These models predate the adaptive mechanism and are not retrofitted with it.
| Model | Released | Adaptive thinking | Manual thinking | Effort levels supported |
|---|---|---|---|---|
| claude haiku 4 5 | 15 October 2025 | Not supported | Required (budget_tokens) | None (legacy) |
| claude opus 4 5 | 24 November 2025 | Not supported | Required (budget_tokens) | low, medium, high |
| claude opus 4 6 | 5 February 2026 | Recommended | Deprecated | low, medium, high, max |
| claude sonnet 4 6 | 17 February 2026 | Recommended | Deprecated | low, medium, high, max |
| claude opus 4 7 | 16 April 2026 | Only mode | Rejected (400 error) | low, medium, high, xhigh, max |
claude opus 4 5 introduced the effort parameter on 24 November 2025, but only with three levels (low, medium, high) and only in conjunction with manual extended thinking; on Opus 4.5 the developer still had to set a budget_tokens value, and effort tuned the model's behavior alongside that budget rather than replacing it.[^9][^2] Anthropic reported that at medium effort Opus 4.5 matched Sonnet 4.5's performance using 76 percent fewer output tokens, an early demonstration of the cost-control benefits that adaptive mode would later automate.[^9]
claude opus 4 6 was the first model where adaptive thinking became the recommended path, and the max effort level was added on Opus 4.6 and Sonnet 4.6.[^3][^11] The Opus 4.6 launch also introduced the public effort parameter as the explicit replacement for budget_tokens going forward.[^3][^12]
claude sonnet 4 6 shipped twelve days after Opus 4.6 and uses the same adaptive thinking implementation. Anthropic recommends explicitly setting effort on Sonnet 4.6 to avoid unexpected latency, since the model defaults to high and "almost always thinks" at that setting; medium is suggested as the practical default for agentic coding and tool-heavy workflows.[^2]
claude opus 4 7 removed manual mode entirely on 16 April 2026 and added the xhigh effort level, sitting between high and max. Opus 4.7 also respects effort levels more strictly than Opus 4.6, particularly at low and medium, where Anthropic warns that the model may produce shallower work than the same prompt would receive on Opus 4.6 at the same effort.[^2][^5] Anthropic's guidance is to raise effort rather than prompt-engineer around shallow output, because the underlying reasoning policy was retuned to honor the effort signal more aggressively.[^2]
Claude Mythos Preview, an experimental research model exposed under the claude-mythos-preview model ID, takes the model-controlled approach further: adaptive thinking is the default whenever the thinking parameter is unset, thinking: {type: "disabled"} is rejected, and the display default is also omitted.[^1] Mythos summarizes thinking from the first token, so the verbose preamble that Claude 4 models normally emit in the first few lines of their thinking output is absent.[^1]
Direct, head-to-head benchmark figures for adaptive versus manual thinking on the same model are not published in Anthropic's release notes, partly because Opus 4.6 and Sonnet 4.6 made adaptive the recommended path on the same day they became available. The publicly reported numbers reflect benchmark runs where the thinking configuration is set implicitly to the standard recommendation for each model.
On SWE-bench Verified, claude opus 4 6 scored 80.8 percent averaged over 25 trials, with a prompt-modified score of 81.42 percent.[^13][^14] claude opus 4 5 reached 80.9 percent on the same benchmark in November 2025, becoming the first model to exceed 80 percent.[^9][^15] claude opus 4 7 reached 87.6 percent in April 2026, a roughly seven-point improvement over Opus 4.6, with Anthropic's recommended evaluation effort at xhigh or max for coding tasks.[^16] claude sonnet 4 6 reached 79.6 percent on SWE-bench Verified at medium effort, only about 1.2 points behind Opus 4.6 at high effort and at roughly 60 percent lower cost per task, an example of how effort routing on a single model is intended to displace the older pattern of routing simple tasks to a smaller model and harder tasks to a larger one.[^8][^4]
Anthropic also reports that Opus 4.6 leads frontier models on Humanity's Last Exam, outperforms GPT-5.2 by approximately 144 Elo points on GDPval-AA, and achieves 76 percent on MRCR v2's 1M-context needle variant compared with 18.5 percent for Sonnet 4.5.[^3][^17] These are reported under the standard launch configuration, which on Opus 4.6 means adaptive thinking at default high effort.
On Opus 4.7, Anthropic emphasizes a 13 percent improvement on internal 93-task coding benchmarks and a CursorBench score of 70 percent versus 58 percent for the previous version, again under the recommended xhigh or max effort configuration that adaptive mode enables.[^5][^16] The shift from Opus 4.6 to Opus 4.7 also coincides with a tokenizer update that introduces a 1.0 to 1.35 multiplier on token counts depending on content, so the cost comparison at fixed effort levels is not strictly apples-to-apples.[^5]
Third-party analyses note that running Sonnet 4.6 at medium effort with adaptive thinking can sometimes match Opus 4.6 at high effort on workloads with a heavy tail of simple subtasks, because the adaptive mechanism on Sonnet 4.6 skips reasoning on easy steps that Opus 4.6 in high mode would still nominally think through.[^8] One reported case shows an 83 percent cost reduction (from approximately $0.053 to $0.009 per code review) when lowering effort from high to low on Sonnet 4.6.[^18]
Adaptive thinking is conceptually parallel to control mechanisms that other model providers have added on top of reasoning models, which scale test time compute by emitting hidden reasoning tokens before the final answer.
OpenAI's o-series (o1, o3, o4-mini) exposes a reasoning.effort parameter with values none, minimal, low, medium, high, and xhigh.[^19] As with Anthropic's effort, lower values trade quality for speed and lower token usage. The o-series reasoning tokens are not visible through the API but are billed as output tokens, similar to Claude's display: "omitted" behavior.[^19] The functional difference is that OpenAI's o-models always reason (reasoning is not optional), whereas Claude in adaptive mode at low or medium effort can decide to skip reasoning entirely on individual requests.
Google's Gemini 2.5 series exposes a thinkingBudget parameter on the generateContent API, with a range from 0 to 24,576 tokens.[^20] Setting thinkingBudget: 0 disables thinking entirely. Setting thinkingBudget: -1 enables dynamic thinking, in which Gemini "intelligently determines how much of this budget to use based on the complexity of the task."[^20] The dynamic-budget mode is the closest analog to Claude's adaptive thinking: in both cases the model decides per-request how much reasoning to allocate, with a developer-set ceiling or effort signal as soft guidance. Gemini 2.5 Flash's thinking-budget controls reportedly produced sixfold cost reductions when turned down in tested scenarios.[^21]
The three approaches differ in three notable ways. First, only Anthropic's adaptive mode and Gemini's dynamic mode let the model choose to skip thinking entirely; OpenAI's o-models always think. Second, only Anthropic exposes effort as a behavioral signal that affects non-thinking output as well, including tool-call verbosity and response length.[^2] Third, only Anthropic ships an explicit display: "omitted" setting that decouples latency optimization from cost: in omitted mode, the developer pays full thinking-token rates but the user-visible response begins streaming sooner.[^1]
Adaptive thinking is bounded by several constraints documented by Anthropic.
Effort is not a token budget. Lowering effort to low does not cap thinking tokens at a known number. The Anthropic documentation is explicit: "At lower effort levels, Claude will still think on sufficiently difficult problems, but it will think less than it would at higher effort levels for the same problem."[^2] Applications that require predictable latency or precise cost ceilings cannot use adaptive thinking alone for this purpose; Anthropic recommends using max_tokens as the hard upper bound on total output (thinking plus response text) and observing stop_reason: "max_tokens" to detect truncation.[^1]
Display omitted does not save money. On Opus 4.7 the default display: "omitted" improves time-to-first-text-token but does not reduce billing. Thinking tokens are billed at the full output rate whether the user sees them or not.[^1][^18] Several third-party guides flag this as the most-misunderstood Opus 4.7 default.[^18]
Mode switches break message caches. Switching between adaptive and manual or disabled thinking inside a conversation invalidates message-level prompt cache breakpoints. System prompts and tool definitions are unaffected.[^1][^11]
Older Claude models cannot opt in. claude opus 4 5, Claude Sonnet 4.5, claude haiku 4 5, and earlier Claude 4 snapshots do not support adaptive thinking. Requests with type: "adaptive" against these model IDs are rejected.[^1] Migration to adaptive requires moving to Opus 4.6 or later, or Sonnet 4.6 or later.
Steering via prompt has tradeoffs. Anthropic notes that adaptive thinking's triggering behavior is promptable, and provides a sample system-prompt snippet that tells Claude to use thinking only when it will meaningfully improve answer quality.[^1] However, "steering Claude to think less often may reduce quality on tasks that benefit from reasoning. Measure the impact on your specific workloads before deploying prompt-based tuning to production."[^1]
Effort interpretation tightened on Opus 4.7. Opus 4.7 respects effort more strictly than Opus 4.6, particularly at low and medium. Anthropic warns that the same prompt at the same effort level may receive shallower reasoning on Opus 4.7 than on Opus 4.6, and recommends raising effort rather than prompt-engineering around the change.[^2] This is a behavioral break for applications that migrate from Opus 4.6 to Opus 4.7 without retuning effort.
Manual mode removal on Opus 4.7 breaks legacy clients. Because Opus 4.7 rejects type: "enabled" with a 400 error rather than silently downgrading, every client library or orchestration layer that hard-codes the old budget_tokens syntax must be updated before it can call Opus 4.7 at all. Anthropic flagged this as one of three breaking changes in the Opus 4.7 release notes.[^5][^16]
No partial-budget control. Unlike Gemini's thinkingBudget parameter, which lets the developer set an explicit ceiling such as 8,192 tokens while still letting the model choose how much of that ceiling to use, adaptive thinking offers only the five-step effort enumeration. There is no way to request, for example, "think at most 4,000 tokens but choose whether to think at all," other than relying on the medium or low effort signal and inspecting actual usage afterward.[^20][^2]