Adaptive thinking

24 min read

Updated Jul 23, 2026

Adaptive thinking is an inference-time reasoning mode in the Anthropic Messages API in which a claude model decides, on a per-request basis, whether to use extended thinking at all and how much of it to spend, instead of the developer setting a fixed token budget in advance.^[1] Anthropic describes it directly: "adaptive thinking lets Claude dynamically determine when and how much to use extended thinking based on the complexity of each request."^[1] The feature is enabled by passing thinking: {type: "adaptive"} and is paired with a separate effort parameter (low, medium, high, xhigh, or max) that gives soft guidance on reasoning depth and overall token spend.^[1]^[2] It is the API-level mechanism that scales a model's test time compute up on hard problems and down (even to zero thinking tokens) on easy ones. Adaptive thinking was introduced with claude opus 4 6 on 5 February 2026, recommended on claude sonnet 4 6 from 17 February 2026, and became the only supported thinking mode on claude opus 4 7 (16 April 2026) and claude opus 4 8 (28 May 2026), where manual budget-based thinking is rejected with a 400 error.^[1]^[3]^[4]^[5]^[22] It replaces the older thinking: {type: "enabled", budget_tokens: N} configuration that originated with Claude 3.7 Sonnet in February 2025 and remained standard through claude opus 4 5.^[6]^[7] Adaptive thinking also automatically enables interleaved reasoning between tool calls without the previously required interleaved-thinking-2025-05-14 beta header, which makes it the default mechanism for agentic workflows built on the anthropic api.^[1]^[8]

What is adaptive thinking?

Extended thinking, the underlying capability that adaptive thinking exposes more flexibly, was first shipped to Anthropic customers on 25 February 2025 with Claude 3.7 Sonnet.^[6] In that release the developer could set thinking: {type: "enabled", budget_tokens: N}, where N ranged from a minimum of 1,024 tokens up to the model's output ceiling, and Claude would spend roughly that many tokens reasoning inside thinking blocks before producing the user-visible answer.^[6]^[7] Pricing was unchanged from non-thinking inference, with thinking tokens billed at the standard output rate.^[6] This same budget-based syntax carried through subsequent Claude 4 releases, including claude opus 4 5 (24 November 2025), claude haiku 4 5 (15 October 2025), and earlier Sonnet 4.x snapshots.^[4]^[9]^[10]

The fixed-budget design forced two awkward decisions on every API call. First, the developer had to predict in advance whether a given user message warranted any thinking at all, because thinking added latency and cost regardless of whether the question was a one-line factoid or a multi-step proof. Second, when thinking was enabled, the developer had to guess at the right ceiling: too low meant the model truncated reasoning mid-chain and gave a worse answer, while too high meant paying for unused capacity on simple inputs. Anthropic's documentation acknowledges this tradeoff and notes that adaptive thinking "can drive better performance than extended thinking with a fixed budget_tokens for many workloads, especially bimodal tasks and long-horizon agentic workflows."^[1] Third-party analyses report that production agent pipelines moving from fixed-budget thinking to adaptive mode see roughly 40 to 60 percent total cost reductions on workloads where simple subtasks dominate, because adaptive Claude can skip thinking entirely on the easy cases while reserving deep reasoning for the difficult ones.^[8]

Adaptive thinking was first announced with the Claude Opus 4.6 launch on 5 February 2026, alongside the public introduction of the effort parameter as a replacement for budget_tokens.^[3]^[12] Anthropic positioned the change as part of a broader shift in how reasoning budgets should be controlled: instead of developers routing simple queries to cheaper models and complex queries to larger ones, a single model with effort-level controls handles both ends of the spectrum.^[8] claude sonnet 4 6 inherited the feature on 17 February 2026, claude opus 4 7 removed manual budget-based thinking entirely on 16 April 2026, and claude opus 4 8 (28 May 2026) kept adaptive thinking as the only supported mode.^[4]^[5]^[22]

How does adaptive thinking work?

In adaptive mode the decision of whether to emit a thinking block, and how long that block should be, is made by the model itself on a per-request basis rather than being fixed by the developer.^[1] The Anthropic documentation states: "Claude evaluates the complexity of each request and determines whether and how much to use extended thinking. At the default effort level (high), Claude almost always thinks. At lower effort levels, Claude may skip thinking for simpler problems."^[1]

The wire format is minimal. A request enabling adaptive thinking on Opus 4.8 looks like this:

{
  "model": "claude-opus-4-8",
  "max_tokens": 16000,
  "thinking": {"type": "adaptive"},
  "messages": [
    {"role": "user", "content": "Explain why the sum of two even numbers is always even."}
  ]
}

There is no budget_tokens field, and no beta header is required.^[1] Output content blocks of type thinking still appear in the response when Claude does choose to reason, and they stream through the same thinking_delta events as before; the only behavioral difference is that the model may emit zero such blocks for sufficiently simple prompts.^[1]

What does the effort parameter do?

Effort is the primary developer-facing control over how aggressively adaptive thinking activates. It is passed inside an output_config object (not inside thinking) and accepts five values: low, medium, high, xhigh, and max.^[1]^[2]

Effort level	Thinking behavior	Model availability
`max`	Claude always thinks with no constraint on depth	Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 4.6, Fable 5, Mythos 5
`xhigh`	Claude always thinks deeply with extended exploration	Opus 4.8, Opus 4.7, Fable 5, Mythos 5
`high` (default)	Claude almost always thinks; deep reasoning on complex tasks	Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 4.6, Opus 4.5
`medium`	Moderate thinking; may skip for very simple queries	Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 4.6, Opus 4.5
`low`	Minimizes thinking; skips for simple tasks where speed matters	Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 4.6, Opus 4.5

Anthropic stresses that effort is "a behavioral signal, not a strict token budget."^[2] Setting effort to low does not cap thinking tokens at any specific number; instead, the model is steered to think less often and less deeply, and it will still reason on sufficiently hard problems even at the lowest setting.^[2] Effort also applies to non-thinking output, so a low setting reduces tool-call verbosity, code comment density, and overall response length even on requests where thinking is skipped.^[2]

The xhigh level was introduced with claude opus 4 7 and sits between high and max. Anthropic recommends it as the default starting point for coding and long-horizon agentic work on Opus 4.7, noting that xhigh expects meaningfully higher token usage than high but produces stronger results on tasks involving repeated tool calling, detailed web search, and large knowledge-base lookups.^[2] Claude Code raised its default effort to xhigh across all subscription tiers with the Opus 4.7 launch.^[5]

How is thinking output displayed: summarized versus omitted?

Independently of whether thinking is adaptive or manual, the display field controls how the thinking content is returned in the API response. It accepts two values:^[1]

display: "summarized" returns a model-generated summary of the full chain of thought inside the thinking field. This is the default on claude opus 4 6, claude sonnet 4 6, and earlier Claude 4 models.^[1]
display: "omitted" returns thinking blocks with an empty thinking field. The encrypted full chain of thought still travels in the signature field for multi-turn continuity, but no human-readable reasoning text is surfaced.^[1]

On claude opus 4 7 and claude opus 4 8 the default flipped to display: "omitted".^[1] Anthropic documents this explicitly: "On Claude Fable 5, Claude Mythos 5, Claude Opus 4.8, Claude Opus 4.7, and Claude Mythos Preview, the default is omitted. Thinking blocks still appear in the response stream, but their thinking field is empty unless you explicitly opt in. This is a silent change from Claude Opus 4.6, where the default was summarized."^[1] Restoring visible reasoning requires thinking: {"type": "adaptive", "display": "summarized"} in the request body.^[1] The stated rationale is latency: with thinking output suppressed, the server can begin streaming the final text response sooner, even though billed thinking tokens are unchanged.^[1]

The signature field is identical between the two display modes, which means a multi-turn conversation can mix omitted and summarized turns without breaking thinking-block validation, and switching between display values mid-conversation is supported.^[1]

How does adaptive thinking handle interleaved thinking?

Interleaved thinking lets Claude produce additional thinking blocks between tool calls inside a single assistant turn, rather than only at the very start of the turn. Without it, an agent's tool-use loop has no intermediate reasoning step: the model picks a tool, reads the result, picks another tool, and so on, without ever pausing to revise its plan.

Interleaved thinking was originally gated behind the interleaved-thinking-2025-05-14 beta header, available on Claude Sonnet 4.5, Opus 4.5, and earlier Claude 4 models.^[7]^[11] Adaptive thinking changes this: on claude opus 4 8, claude opus 4 7, claude opus 4 6, and claude sonnet 4 6 in adaptive mode, interleaved thinking is automatically enabled and the beta header is no longer required.^[1] On Opus 4.6 in manual (type: "enabled") mode, interleaved thinking is unavailable, which Anthropic flags directly: "If your agentic workflow requires thinking between tool calls on Opus 4.6, use adaptive mode."^[1] On Sonnet 4.6, manual mode still supports interleaved thinking through the beta header.^[1]

This change matters operationally. For agents built on Claude that use tools heavily, adaptive thinking is now the only way to get inter-tool reasoning on the latest Opus models without the beta header dance, and it is the only thinking mode at all on Opus 4.7 and Opus 4.8.^[1]^[5]^[22]

How does validation differ from manual mode?

Adaptive thinking relaxes one constraint that manual extended thinking enforced. In manual mode the API rejected any thinking-enabled assistant turn that did not begin with a thinking block; the first content block of a turn had to be type: "thinking". Adaptive mode removes this requirement: previous assistant turns are not forced to start with thinking blocks, so a conversation history that mixes thinking and non-thinking turns is valid.^[1]

Adaptive mode also preserves prompt caching breakpoints across consecutive requests that all use adaptive thinking. Switching between adaptive and manual or disabled thinking mid-conversation breaks message-level cache breakpoints, although system prompts and tool definitions remain cached regardless of mode changes.^[1]^[11]

How does adaptive thinking compare to manual extended thinking?

The differences between adaptive thinking and the prior manual extended thinking configuration can be summarized in a single table.

Property	Manual (`type: "enabled"`)	Adaptive (`type: "adaptive"`)
Configuration field	`budget_tokens: N`	`effort: low/medium/high/xhigh/max`
Who decides whether to think	Developer (always thinks when enabled)	Model (per-request)
Who decides depth	Developer (fixed budget)	Model, guided by `effort`
Available on Opus 4.7 / Opus 4.8	Rejected with 400 error	Required mode
Available on Opus 4.6 / Sonnet 4.6	Deprecated but functional	Recommended
Available on Opus 4.5, Sonnet 4.5, Haiku 4.5	Required mode	Not supported
Interleaved thinking	Beta header on Sonnet 4.6 only; unavailable on Opus 4.6 manual mode	Automatic on Opus 4.6, Sonnet 4.6, Opus 4.7, Opus 4.8
First turn must begin with thinking block	Yes	No
Token cost on simple queries	Pays full budget regardless	May pay zero thinking tokens

The deprecation trajectory is explicit. Anthropic's documentation states that thinking.type: "enabled" and budget_tokens are "deprecated on Opus 4.6 and Sonnet 4.6 and will be removed in a future model release."^[1] claude opus 4 7 and claude opus 4 8 enforce this removal: requests using type: "enabled" are rejected with a 400 error rather than silently downgraded, which means client libraries and orchestrators built against the older interface require code changes to call those models at all.^[1]^[5]^[22]

Existing Sonnet 4.5, Opus 4.5, and Haiku 4.5 endpoints continue to require manual extended thinking with budget_tokens and do not accept the adaptive type at all.^[1]^[11] These models predate the adaptive mechanism and are not retrofitted with it.

Which Claude models use adaptive thinking?

Model	Released	Adaptive thinking	Manual thinking	Effort levels supported
claude haiku 4 5	15 October 2025	Not supported	Required (`budget_tokens`)	None (legacy)
claude opus 4 5	24 November 2025	Not supported	Required (`budget_tokens`)	low, medium, high
claude opus 4 6	5 February 2026	Recommended	Deprecated	low, medium, high, max
claude sonnet 4 6	17 February 2026	Recommended	Deprecated	low, medium, high, max
claude opus 4 7	16 April 2026	Only mode	Rejected (400 error)	low, medium, high, xhigh, max
claude opus 4 8	28 May 2026	Only mode	Rejected (400 error)	low, medium, high, xhigh, max
Claude Fable 5	2026	Always on	Not supported	low, medium, high, xhigh, max
Claude Mythos 5	2026	Always on	Not supported	low, medium, high, xhigh, max
Claude Mythos Preview	2026	Default mode	Not supported	low, medium, high, max

claude opus 4 5 introduced the effort parameter on 24 November 2025, but only with three levels (low, medium, high) and only in conjunction with manual extended thinking; on Opus 4.5 the developer still had to set a budget_tokens value, and effort tuned the model's behavior alongside that budget rather than replacing it.^[9]^[2] Anthropic reported that at medium effort Opus 4.5 matched Sonnet 4.5's performance using 76 percent fewer output tokens, an early demonstration of the cost-control benefits that adaptive mode would later automate.^[9]

claude opus 4 6 was the first model where adaptive thinking became the recommended path, and the max effort level was added on Opus 4.6 and Sonnet 4.6.^[3]^[11] The Opus 4.6 launch also introduced the public effort parameter as the explicit replacement for budget_tokens going forward.^[3]^[12]

claude sonnet 4 6 shipped twelve days after Opus 4.6 and uses the same adaptive thinking implementation. Anthropic recommends explicitly setting effort on Sonnet 4.6 to avoid unexpected latency, since the model defaults to high and almost always thinks at that setting; medium is suggested as the practical default for agentic coding and tool-heavy workflows.^[2]

claude opus 4 7 removed manual mode entirely on 16 April 2026 and added the xhigh effort level, sitting between high and max. Opus 4.7 also respects effort levels more strictly than Opus 4.6, particularly at low and medium, where Anthropic warns that the model may produce shallower work than the same prompt would receive on Opus 4.6 at the same effort.^[2]^[5] Anthropic's guidance is to raise effort rather than prompt-engineer around shallow output, because the underlying reasoning policy was retuned to honor the effort signal more aggressively.^[2]

What changed for adaptive thinking in Claude Opus 4.8?

claude opus 4 8 shipped on 28 May 2026, 41 days after Opus 4.7 and the shortest gap between Opus point releases to date.^[22]^[23] Like Opus 4.7, it supports adaptive thinking as the only thinking mode: thinking is off unless the request explicitly sets thinking: {type: "adaptive"}, and manual type: "enabled" with budget_tokens is rejected with a 400 error.^[1] The xhigh effort level and the display: "omitted" default both carry over from Opus 4.7.^[1] Anthropic positioned the release around Opus 4.8 being "less prone to overthinking," with adaptive thinking triggering reasoning only when the turn needs it: the model responds directly on simple lookups and short agentic steps but reasons before answering on complex multi-step problems.^[5]^[22] Opus 4.8 ships with a 1,000,000-token context window by default, 128,000 max output tokens, and pricing of $5 per million input tokens and $25 per million output tokens at standard rates, with thinking tokens billed at the output rate as on prior models.^[22]^[24]

Which Claude models think on every turn?

Three later models take the model-controlled approach further by making thinking mandatory. On Claude Fable 5 (claude-fable-5) and Claude Mythos 5 (claude-mythos-5), adaptive thinking is always on: it applies whenever the thinking parameter is unset and thinking: {type: "disabled"} is not supported.^[1] On both models the raw chain of thought is never returned; the thinking blocks are regular thinking blocks (not redacted_thinking), and the display default is omitted.^[1] Claude Mythos Preview (claude-mythos-preview) makes adaptive thinking the default mode that auto-applies whenever thinking is unset, also defaults display to omitted, and summarizes thinking from the first token, so the verbose preamble that Claude 4 models normally emit in the first few lines of their thinking output is absent.^[1]

How does adaptive thinking perform on benchmarks?

Direct, head-to-head benchmark figures for adaptive versus manual thinking on the same model are not published in Anthropic's release notes, partly because Opus 4.6 and Sonnet 4.6 made adaptive the recommended path on the same day they became available. The publicly reported numbers reflect benchmark runs where the thinking configuration is set implicitly to the standard recommendation for each model.

On SWE-bench Verified, claude opus 4 6 scored 80.8 percent averaged over 25 trials, with a prompt-modified score of 81.42 percent.^[13]^[14] claude opus 4 5 reached 80.9 percent on the same benchmark in November 2025, becoming the first model to exceed 80 percent.^[9]^[15] claude opus 4 7 reached 87.6 percent in April 2026, a roughly seven-point improvement over Opus 4.6, with Anthropic's recommended evaluation effort at xhigh or max for coding tasks.^[16] claude sonnet 4 6 reached 79.6 percent on SWE-bench Verified at medium effort, only about 1.2 points behind Opus 4.6 at high effort and at roughly 60 percent lower cost per task, an example of how effort routing on a single model is intended to displace the older pattern of routing simple tasks to a smaller model and harder tasks to a larger one.^[8]^[4]

Anthropic also reports that Opus 4.6 leads frontier models on Humanity's Last Exam, outperforms GPT-5.2 by approximately 144 Elo points on GDPval-AA, and achieves 76 percent on MRCR v2's 1M-context needle variant compared with 18.5 percent for Sonnet 4.5.^[3]^[17] These are reported under the standard launch configuration, which on Opus 4.6 means adaptive thinking at default high effort. claude opus 4 8 was reported as the strongest computer-use and browser-agent model Anthropic had tested at launch, scoring 84 percent on Online-Mind2Web and roughly four times less likely than Opus 4.7 to let flaws in its own code pass without flagging them.^[22]^[23]

On Opus 4.7, Anthropic emphasizes a 13 percent improvement on internal 93-task coding benchmarks and a CursorBench score of 70 percent versus 58 percent for the previous version, again under the recommended xhigh or max effort configuration that adaptive mode enables.^[5]^[16] The shift from Opus 4.6 to Opus 4.7 also coincides with a tokenizer update that introduces a 1.0 to 1.35 multiplier on token counts depending on content, so the cost comparison at fixed effort levels is not strictly apples-to-apples.^[5]

Third-party analyses note that running Sonnet 4.6 at medium effort with adaptive thinking can sometimes match Opus 4.6 at high effort on workloads with a heavy tail of simple subtasks, because the adaptive mechanism on Sonnet 4.6 skips reasoning on easy steps that Opus 4.6 in high mode would still nominally think through.^[8] One reported case shows an 83 percent cost reduction (from approximately $0.053 to $0.009 per code review) when lowering effort from high to low on Sonnet 4.6.^[18]

How does adaptive thinking compare to other reasoning model controls?

Adaptive thinking is conceptually parallel to control mechanisms that other model providers have added on top of reasoning models, which scale test time compute by emitting hidden reasoning tokens before the final answer. Both reasoning effort and dynamic thinking budgets are forms of test-time compute allocation governed at request time.

OpenAI's reasoning models expose a reasoning_effort parameter. On the o-series (o1, o3, o4-mini) the accepted values are low, medium, and high, with medium the default; the broader enumeration that adds none, minimal, and xhigh belongs to the GPT-5.x reasoning models rather than the older o-series.^[19]^[25] As with Anthropic's effort, lower values trade quality for speed and lower token usage. OpenAI's reasoning tokens are not visible through the API but are billed as output tokens, similar to Claude's display: "omitted" behavior.^[19] The functional difference is that most OpenAI reasoning models always reason (reasoning is not optional, except at the none/minimal end of the GPT-5.x scale), whereas Claude in adaptive mode at low or medium effort can decide to skip reasoning entirely on individual requests.

Google's Gemini 2.5 series exposes a thinkingBudget parameter on the generateContent API, with a per-model range (0 to 24,576 tokens on Gemini 2.5 Flash).^[20] Setting thinkingBudget: 0 disables thinking entirely. Setting thinkingBudget: -1 enables dynamic thinking, in which Gemini, in Google's words, "will adjust the budget based on the complexity of the request."^[20] The dynamic-budget mode is the closest analog to Claude's adaptive thinking: in both cases the model decides per-request how much reasoning to allocate, with a developer-set ceiling or effort signal as soft guidance. Gemini 2.5 Flash's thinking-budget controls reportedly produced sixfold cost reductions when turned down in tested scenarios.^[21]

The three approaches differ in three notable ways. First, only Anthropic's adaptive mode and Gemini's dynamic mode let the model choose to skip thinking entirely; most OpenAI reasoning models always think. Second, only Anthropic exposes effort as a behavioral signal that affects non-thinking output as well, including tool-call verbosity and response length.^[2] Third, only Anthropic ships an explicit display: "omitted" setting that decouples latency optimization from cost: in omitted mode, the developer pays full thinking-token rates but the user-visible response begins streaming sooner.^[1]

What are the limitations and known issues?

Adaptive thinking is bounded by several constraints documented by Anthropic.

Effort is not a token budget. Lowering effort to low does not cap thinking tokens at a known number. The Anthropic documentation is explicit: "At lower effort levels, Claude will still think on sufficiently difficult problems, but it will think less than it would at higher effort levels for the same problem."^[2] Applications that require predictable latency or precise cost ceilings cannot use adaptive thinking alone for this purpose; Anthropic recommends using max_tokens as the hard upper bound on total output (thinking plus response text) and observing stop_reason: "max_tokens" to detect truncation.^[1]

Display omitted does not save money. On Opus 4.7 and Opus 4.8 the default display: "omitted" improves time-to-first-text-token but does not reduce billing. Anthropic states plainly: "You're still charged for the full thinking tokens. Omitting reduces latency, not cost."^[1] Several third-party guides flag this as the most-misunderstood Opus 4.7 default.^[18]

Mode switches break message caches. Switching between adaptive and manual or disabled thinking inside a conversation invalidates message-level prompt cache breakpoints. System prompts and tool definitions are unaffected.^[1]^[11]

Older Claude models cannot opt in. claude opus 4 5, Claude Sonnet 4.5, claude haiku 4 5, and earlier Claude 4 snapshots do not support adaptive thinking. Requests with type: "adaptive" against these model IDs are rejected.^[1] Migration to adaptive requires moving to Opus 4.6 or later, or Sonnet 4.6 or later.

Steering via prompt has tradeoffs. Anthropic notes that adaptive thinking's triggering behavior is promptable, and provides a sample system-prompt snippet that tells Claude to use thinking only when it will meaningfully improve answer quality.^[1] However, the documentation warns: "Steering Claude to think less often may reduce quality on tasks that benefit from reasoning. Measure the impact on your specific workloads before deploying prompt-based tuning to production."^[1]

Effort interpretation tightened on Opus 4.7. Opus 4.7 respects effort more strictly than Opus 4.6, particularly at low and medium. Anthropic warns that the same prompt at the same effort level may receive shallower reasoning on Opus 4.7 than on Opus 4.6, and recommends raising effort rather than prompt-engineering around the change.^[2] This is a behavioral break for applications that migrate from Opus 4.6 to Opus 4.7 without retuning effort.

Manual mode removal on the latest Opus breaks legacy clients. Because Opus 4.7 and Opus 4.8 reject type: "enabled" with a 400 error rather than silently downgrading, every client library or orchestration layer that hard-codes the old budget_tokens syntax must be updated before it can call those models at all. Anthropic flagged this as one of the breaking changes in the Opus 4.7 release notes.^[5]^[16]

No partial-budget control. Unlike Gemini's thinkingBudget parameter, which lets the developer set an explicit ceiling such as 8,192 tokens while still letting the model choose how much of that ceiling to use, adaptive thinking offers only the five-step effort enumeration. There is no way to request, for example, "think at most 4,000 tokens but choose whether to think at all," other than relying on the medium or low effort signal and inspecting actual usage afterward.^[20]^[2]

ELI5: adaptive thinking in plain terms

Imagine asking a very smart friend questions all day. For "what is 2 plus 2?" you do not want them to sit and think for a minute first; you want the answer right away. For "plan my whole week around three jobs and a flight," you do want them to stop and work it out carefully. Older AI settings made you choose one speed for every question: either always-thinking (slow and expensive on easy questions) or never-thinking (fast but worse on hard ones). Adaptive thinking lets the AI itself decide, question by question, whether to stop and think and for how long. You can nudge it with an effort dial from low (think rarely, answer fast) up to max (always think hard), but the model makes the final call each time. The benefit is that you spend more computing power, time, and money only on the hard problems and almost none on the easy ones.

References

^Anthropic, "Adaptive thinking", Claude API Docs, 2026. platform.claude.com/...adaptive-thinking. Accessed 2026-06-28.
^Anthropic, "Effort", Claude API Docs, 2026. platform.claude.com/...effort. Accessed 2026-06-28.
^Anthropic, "Introducing Claude Opus 4.6", Anthropic News, 2026-02-05. anthropic.com/...claude-opus-4-6. Accessed 2026-06-28.
^Anthropic, "Introducing Claude Sonnet 4.6", Anthropic News, 2026-02-17. anthropic.com/...claude-sonnet-4-6. Accessed 2026-06-28.
^Anthropic, "Introducing Claude Opus 4.7", Anthropic News, 2026-04-16. anthropic.com/...claude-opus-4-7. Accessed 2026-06-28.
^Simon Willison, "Claude 3.7 Sonnet, extended thinking and long output, llm-anthropic 0.14", simonwillison.net, 2025-02-25. simonwillison.net/...llm-anthropic-014 Accessed 2026-06-28.
^Anthropic, "Building with extended thinking", Claude API Docs, 2026. platform.claude.com/...extended-thinking. Accessed 2026-06-28.
^AgentMarketCap, "Claude 4.6 Adaptive Thinking Rewrites the Agent Cost Playbook: Effort Routing Replaces Model Routing", AgentMarketCap blog, 2026-04-14. agentmarketcap.ai/...t-pipeline-cost-quality-2026. Accessed 2026-06-28.
^Anthropic, "Introducing Claude Opus 4.5", Anthropic News, 2025-11-24. anthropic.com/...claude-opus-4-5. Accessed 2026-06-28.
^Amazon Web Services, "Claude Haiku 4.5", Amazon Bedrock User Guide, 2025-10-15. docs.aws.amazon.com/...opic-claude-haiku-4-5.html. Accessed 2026-06-28.
^Amazon Web Services, "Adaptive thinking", Amazon Bedrock User Guide, 2026. docs.aws.amazon.com/...ges-adaptive-thinking.html. Accessed 2026-06-28.
^MarkTechPost, "Anthropic Releases Claude Opus 4.6 With 1M Context, Agentic Coding, Adaptive Reasoning Controls, and Expanded Safety Tooling Capabilities", MarkTechPost, 2026-02-05. marktechpost.com/...ed-safety-tooling-capabilities Accessed 2026-06-28.
^Vellum, "Claude Opus 4.6 vs 4.5 Benchmarks (Explained)", Vellum blog, 2026. vellum.ai/...claude-opus-4-6-benchmarks. Accessed 2026-06-28.
^Morph LLM, "Claude Benchmarks (2026): Every Score for Opus 4.6, Sonnet 4.6 & Haiku", morphllm.com, 2026. morphllm.com/claude-benchmarks. Accessed 2026-06-28.
^Vellum, "Claude Opus 4.5 Benchmarks (Explained)", Vellum blog, 2025-11-24. vellum.ai/...claude-opus-4-5-benchmarks. Accessed 2026-06-28.
^Vellum, "Claude Opus 4.7 Benchmarks Explained", Vellum blog, 2026-04-16. vellum.ai/...claude-opus-4-7-benchmarks-explained. Accessed 2026-06-28.
^DigitalApplied, "Claude Opus 4.6: Features, Benchmarks, and Pricing Guide", digitalapplied.com, 2026. digitalapplied.com/...e-features-benchmarks-guide. Accessed 2026-06-28.
^Apiyi, "Claude Adaptive Thinking Mode: 4 Major Upgrades Replacing Extended Thinking", apiyi.com, 2026. help.apiyi.com/...place-extended-thinking-en.html. Accessed 2026-06-28.
^OpenAI, "Reasoning models", OpenAI API documentation, 2026. developers.openai.com/...reasoning. Accessed 2026-06-28.
^Google, "Gemini thinking", Google AI for Developers, 2026. ai.google.dev/...thinking. Accessed 2026-06-28.
^VentureBeat, "Google's Gemini 2.5 Flash introduces 'thinking budgets' that cut AI costs by 600% when turned down", VentureBeat, 2025. venturebeat.com/...-costs-by-600-when-turned-down. Accessed 2026-06-28.
^Anthropic, "Introducing Claude Opus 4.8", Anthropic News, 2026-05-28. anthropic.com/...claude-opus-4-8. Accessed 2026-06-28.
^TechCrunch, "Anthropic releases Opus 4.8 with new 'dynamic workflow' tool", TechCrunch, 2026-05-28. techcrunch.com/...8-with-new-dynamic-workflow-tool Accessed 2026-06-28.
^Anthropic, "Claude Opus 4.8", claude.com pricing, 2026. anthropic.com/...opus. Accessed 2026-06-28.
^Microsoft, "Azure OpenAI reasoning models", Microsoft Learn, 2026. learn.microsoft.com/...reasoning. Accessed 2026-06-28.

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

3 revisions by 1 contributors · v4 · 4,757 words · full history

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Suggest edit

What links here

Anthropic API Claude 3.7 Sonnet Claude Opus 4.6 Claude Opus 4.7 Claude Opus 4.8 Claude Sonnet 4.6 Claude Sonnet 5 Extended thinking Interleaved thinking Task budgets

What is adaptive thinking?

How does adaptive thinking work?

What does the effort parameter do?

How is thinking output displayed: summarized versus omitted?

How does adaptive thinking handle interleaved thinking?

How does validation differ from manual mode?

How does adaptive thinking compare to manual extended thinking?

Which Claude models use adaptive thinking?

What changed for adaptive thinking in Claude Opus 4.8?

Which Claude models think on every turn?

How does adaptive thinking perform on benchmarks?

How does adaptive thinking compare to other reasoning model controls?

What are the limitations and known issues?

ELI5: adaptive thinking in plain terms

See also

References

Improve this article

Related Articles

GRPO

RLVR

Inference-time scaling

Extended thinking

Interleaved thinking

Claude Opus 5

What links here

Related Articles

GRPO

RLVR

Inference-time scaling

Extended thinking

Interleaved thinking

Claude Opus 5

What links here