Command A Reasoning
Last reviewed
May 31, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 2,039 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 2,039 words
Add missing citations, update stale details, or suggest a clearer explanation.
Command A Reasoning is an enterprise reasoning model released by Cohere on August 21, 2025, as part of the Command A family of large language models. It pairs an explicit chain of thought process with the deployment efficiency and security focus that Cohere builds its products around, and it ships with a controllable thinking budget so a team can decide how much the model deliberates before it answers. The weights carry the model identifier command-a-reasoning-08-2025, and Cohere positions the system for agentic and tool using work, multilingual enterprise tasks, and on premises deployment where data cannot leave a customer's own infrastructure. [1][2]
The model is a fine tune of the earlier Command A instruct model rather than a replacement for it. The model card lists it as fine tuned from c4ai-command-a-03-2025, so it shares the same 111 billion parameter footprint and 256K token context window, and it adds a reasoning mode that Cohere reports lifts performance on agentic benchmarks above its non reasoning sibling and above several competing enterprise reasoning systems. [1][3]
Command A Reasoning is an auto regressive transformer that generates intermediate reasoning tokens before it produces a final response. This is the pattern shared by the broader class of reasoning models, where the model works through a problem step by step in a scratchpad style trace and then commits to an answer. Cohere's contribution here is less about the raw idea and more about packaging it for buyers who care about cost, latency, and control, the practical constraints that govern whether a reasoning model can actually run inside a bank, a hospital, or a government department. [1][2]
The company describes the target as enterprise tasks done with full control. In practice that means three things. The model can act as the planner and tool caller inside AI agents that retrieve documents, call APIs, and chain multiple steps. It can run in private deployments rather than only through a hosted API. And it gives administrators a dial over how much compute each query is allowed to spend on thinking, which matters a great deal once a reasoning model is answering thousands of queries a day. Cohere frames customer service, research assistance, and internal knowledge work as primary use cases, the kinds of jobs where a wrong answer carries real cost and a record of the model's reasoning is useful. [1][7]
Command A Reasoning was built by Cohere together with Cohere Labs, the company's research division. Cohere was founded in 2019 by Aidan Gomez and colleagues, and it has concentrated on selling language models to businesses rather than chasing a consumer chatbot audience. The reasoning model fits that strategy. It is meant to be one of the engines behind Cohere's North platform, an agentic workspace that connects models to enterprise data and tools. [1][2]
The headline feature is a token budget for reasoning. A developer can set how many tokens the model is permitted to spend on its internal thinking before it has to answer. A larger budget lets the model reason longer on hard problems, which tends to raise accuracy on math, planning, and multi step tool use. A smaller budget caps the spend, which lowers cost and latency for routine work where deep deliberation buys little. [1][4]
This turns the usual tradeoff into something a team can manage per workload instead of accepting a single fixed behavior. A nightly batch job that analyzes contracts can be given a generous budget, while an interactive support assistant can be kept lean so replies stay fast. Cohere frames this as a way to balance quality against compute, and it is the main reason the company describes the model as offering full control. [1][4]
Reasoning can also be switched off entirely. In Cohere's chat template the behavior is governed by a reasoning flag that defaults to on, and setting it off makes the model skip the thinking stage. When thinking runs, the model emits its trace between <START_THINKING> and <END_THINKING> markers before the final answer. With thinking disabled, Command A Reasoning behaves like a standard instruct model and responds directly, which lets one deployment serve both reasoning heavy and latency sensitive traffic without swapping models. [1][3]
Command A Reasoning has 111 billion parameters and uses an optimized transformer architecture, with weights released in BF16 precision. Its maximum context length is 256K tokens, long enough to hold large document sets, codebases, or extended agent transcripts in a single prompt, and it can generate up to 32K output tokens. [3]
Deployment efficiency is a deliberate design goal. The model can run on a single H100 or A100 GPU at a reduced context length of 128K tokens, and it reaches the full 256K context when spread across multiple GPUs. Cohere recommends 4 H100 GPUs for production serving and notes that 4 A100 GPUs work for evaluation and testing. The single GPU option lowers the hardware bar for private deployment, which is the setting Cohere cares most about, since many of its customers in regulated industries want the model running inside their own environment. [1][3]
| Specification | Detail |
|---|---|
| Developer | Cohere and Cohere Labs |
| Model identifier | command-a-reasoning-08-2025 |
| Release date | August 21, 2025 |
| Parameters | 111 billion |
| Architecture | Optimized auto regressive transformer, BF16 weights |
| Base model | c4ai-command-a-03-2025 |
| Maximum context | 256K tokens |
| Maximum output | 32K tokens |
| Single GPU context | 128K tokens on one H100 or A100 |
| Full context deployment | 256K tokens across multiple GPUs |
| Production hardware | 4 H100 GPUs recommended, 4 A100 for testing |
| Modality | Text in, text out |
| Reasoning control | Adjustable thinking token budget, can be disabled |
| Languages | 23 |
| Weights license | CC-BY-NC 4.0, research use |
| Availability | Cohere platform, North, Hugging Face |
Cohere built Command A Reasoning for agentic workloads, the kind where a model has to decide which tool to call, pass the right arguments, read the result, and then plan the next step. The thinking budget feeds directly into this, because a model that can reason before it acts tends to choose tools more reliably and recover from errors better than one that responds in a single pass. Function calling and multi step tool use are exactly what the BFCL and Tau-bench evaluations measure, and they are also the operations a deployed agent performs all day, so progress on those benchmarks maps onto the work the model is meant to do. [1][5]
The model supports 23 languages, including English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, Chinese, Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian. Multilingual coverage has been a consistent priority across the Command line, and it matters for the global enterprises that are Cohere's main customers. [3]
Like other Cohere models, Command A Reasoning includes configurable safety behavior so that organizations can tune how the model handles sensitive content for their own context rather than accepting one fixed policy. Cohere also reports that the model reaches a strong balance between refusing genuinely harmful requests and staying useful, which it frames as the best safety and usefulness tradeoff among the systems it compared. [1][5]
Cohere reports that Command A Reasoning leads gpt-oss-120b, DeepSeek-R1 0528, and Mistral Magistral Medium on a set of agentic evaluations, and that it improves on the earlier Command A instruct model. The evaluations Cohere highlights cover function calling on BFCL-v3, multi turn tool use in simulated environments on Tau-bench, and open ended research style tasks on DeepResearch Bench. The published comparisons appear as charts in Cohere's announcement rather than as a numeric table, so the summary below records each benchmark, what it measures, and the verified directional result rather than inventing exact figures that were presented only in graphical form. [1][5]
| Benchmark | What it measures | Reported result |
|---|---|---|
| BFCL-v3 (Berkeley Function Calling Leaderboard) | Accuracy of selecting and calling functions or tools | Leads gpt-oss-120b, DeepSeek-R1 0528, and Mistral Magistral Medium |
| Tau-bench | Multi turn tool use in simulated task environments | Leads the same comparison set |
| DeepResearch Bench | Open ended, multi step research style tasks | Leads the same comparison set |
| Safety versus usefulness | Tradeoff between refusing harmful requests and staying helpful | Cohere reports the best tradeoff among the models compared |
Readers who need exact numeric scores should consult Cohere's announcement post and the model card directly, since the figures there are the authoritative source and are subject to change as evaluation suites are updated. [1][3]
The model weights are published on Hugging Face under a CC-BY-NC 4.0 license, which permits non commercial and research use and requires attribution. Commercial use runs through Cohere, and the company also asks users to follow the Cohere Labs Acceptable Use Policy. This split, open weights for research with a commercial path through the vendor, matches how Cohere has released earlier Command models. [3][10]
For production use, Command A Reasoning is served on the Cohere platform and through the North agentic workspace, and it can be deployed privately on a customer's own infrastructure for organizations that cannot send data to a third party. Cohere lists the model in its API catalog under the command-a-reasoning-08-2025 identifier alongside the rest of the Command line. [1][2][8]
Command A Reasoning is the reasoning oriented member of the Command A generation. The base Command A model, released in March 2025, introduced the 111 billion parameter size and the 256K context window, and it was notable for running on as few as two GPUs while matching larger rivals on enterprise tasks. Command A Reasoning keeps that profile and layers an explicit thinking stage on top, so it can be read as Command A taught to deliberate, with the option to turn the deliberation off and behave like the original. [3][9]
Both models descend from the earlier Command R and Command R Plus models, which established Cohere's focus on retrieval augmented generation and tool use for business users. Command A and its reasoning variant succeed that R series and push further on efficiency, context length, and now controllable reasoning. [6]
The non commercial weights license means the open release is for research only, so teams that want to run the weights in a commercial product cannot do so from Hugging Face alone and must go through Cohere. The single GPU configuration trades context for hardware savings, dropping to 128K tokens, so workloads that genuinely need the full 256K window require more than one GPU. As with reasoning models in general, a generous thinking budget raises both cost and latency, which is exactly why the budget control exists, and getting good results means tuning that budget per task rather than assuming more thinking is always better. Finally, the headline benchmark comparisons come from Cohere's own evaluation and are best read alongside independent testing on a buyer's specific use case. [1][3]