Command R
Last reviewed
May 17, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v5 · 4,401 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 17, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v5 · 4,401 words
Add missing citations, update stale details, or suggest a clearer explanation.
Command R is a family of large language models developed by Cohere, a Canadian artificial intelligence company focused on enterprise applications. The Command R series was designed from the ground up for retrieval-augmented generation (RAG), multi-step tool use, and grounded generation with inline citations. The lineup released between March 2024 and February 2025 includes Command R (35 billion parameters), [[command_r_plus]] (104 billion parameters), and Command R7B (7 billion parameters), together with refreshes in August 2024 and an Arabic-optimized R7B variant in February 2025.[^1][^2][^3] In March 2025 Cohere positioned [[command_a]] as the successor product line, marking Command R7B as the "final" model in the R series.[^4][^5]
All open-weight releases in the family are distributed via [[hugging_face]] under the CC-BY-NC 4.0 license for research and non-commercial use, while commercial access is offered through Cohere's hosted API and cloud platform partners. This article covers the Command R family of large language models. For the broader successor lineup including Command A, Command A Vision, Command A Reasoning, and Command A Translate, see [[command_a]].
Cohere was founded in 2019 by Aidan Gomez, Ivan Zhang, and Nick Frosst, all University of Toronto alumni. Gomez was one of the eight co-authors of the 2017 paper "Attention Is All You Need," which introduced the Transformer architecture while he was an intern at Google Brain.[^6] Frosst also worked as a researcher at Google Brain before co-founding Cohere, and Zhang had previously collaborated with Gomez on research at FOR.ai. The company is headquartered in Toronto and San Francisco, with offices in Montreal, London, New York, Paris, and Seoul.[^7]
Unlike AI labs that target consumer chatbots, Cohere has pursued an enterprise-first strategy since its earliest days. Its models are available through Cohere's own API as well as through Amazon Web Services (Bedrock, SageMaker), Microsoft Azure (Azure AI Foundry), Google Cloud (Vertex AI), and Oracle Cloud Infrastructure (OCI Generative AI).[^8] This cloud-agnostic deployment, combined with support for virtual private cloud (VPC) and on-premises deployments, distinguishes Cohere in the enterprise AI market. By September 2025 Cohere had raised approximately $1.6 billion in total funding, with the latest extension bringing its valuation to $7 billion alongside a deepened partnership with AMD.[^9]
Before the R series, Cohere offered general-purpose "Command" and "Command Light" models that lacked the specialized RAG, tool use, and citation capabilities of the R family. The Command R series represented a deliberate pivot toward optimizing models for specific enterprise workflows rather than competing on general chat benchmarks alone.
Command R was announced on March 11, 2024, as Cohere's first model purpose-built for enterprise RAG workflows and tool use at scale.[^1][^10] With 35 billion parameters and a 128,000-token context window, Command R represented a significant shift in Cohere's model strategy: rather than chasing general-purpose chat benchmarks, the model was optimized for high-precision retrieval, grounded generation, and low-latency production deployments.
Command R uses an optimized autoregressive transformer decoder-only architecture. The model was pretrained on a large multilingual corpus and then refined through supervised fine-tuning (SFT) and preference training to align its behavior with human expectations for helpfulness and safety.[^2] It accepts text input and produces text output, using a proprietary chat format with special tokens delineating turns, system prompts, and tool interactions. The model implements [[grouped_query_attention]] (GQA) to reduce key-value cache memory at long context lengths, which is particularly important for RAG workloads that fill the 128K window with retrieved passages.[^11]
Command R was optimized for strong performance across 10 key languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Simplified Chinese, and Arabic. An additional 13 languages were included in the pretraining data with lower optimization priority: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian.[^2]
Cohere's tokenizer plays an important role in multilingual performance. Unlike many tokenizers that are heavily English-centric, Cohere designed its tokenizer for cross-lingual efficiency. In a comparison published by Sebastian Ruder, the Cohere tokenizer produced roughly 1.67 times fewer tokens than OpenAI's tokenizer for equivalent Japanese text.[^12] This efficiency directly reduces costs (since API pricing is per token) and allows more content to fit within the context window for non-English users.
Command R introduced several capabilities that became hallmarks of the entire model family:[^2][^13]
directly_answer tool that allows it to abstain from calling external tools when the query can be answered from its own knowledge.To support deployment on consumer-grade hardware, Cohere released quantized versions of Command R on Hugging Face. An 8-bit quantized version using bitsandbytes is available in the main repository, and a separate 4-bit quantized version is hosted at CohereLabs/c4ai-command-r-v01-4bit.[^2] These quantized versions make it practical to run the 35-billion-parameter model on hardware with limited GPU memory at the cost of some precision.
At release, Command R was priced at $0.50 per million input tokens and $1.50 per million output tokens via Cohere's hosted API, with the August 2024 refresh reducing prices to $0.15/$0.60.[^14]
Command R+ was released on April 4, 2024 (first on Microsoft Azure, then via Cohere's API and other clouds) as Cohere's flagship large language model.[^15][^16] At 104 billion parameters with a 128,000-token context window, it offered stronger reasoning, improved multilingual performance, and better results on tool use benchmarks compared to the smaller Command R.[^3]
On the Hugging Face Open LLM Leaderboard, Command R+ reported the following scores:[^3]
| Benchmark | Command R+ | DBRX Instruct | Mixtral 8x7B |
|---|---|---|---|
| ARC-Challenge | 70.99 | 68.9 | 70.1 |
| HellaSwag | 88.6 | 89.0 | 87.6 |
| MMLU | 75.7 | 73.7 | 71.4 |
| TruthfulQA | 56.3 | 66.9 | 65.0 |
| Winogrande | 85.4 | 81.8 | 81.1 |
| GSM8K | 70.7 | 66.9 | 61.1 |
| Average | 74.6 | 74.5 | 72.7 |
Beyond standard academic benchmarks, Cohere highlighted Command R+'s performance on enterprise-relevant tasks. According to Cohere's internal evaluations, Command R+ outperformed GPT-4 Turbo on the ToolTalk (Hard) benchmark for conversational tool use and on the Berkeley Function Calling Leaderboard (BFCL) for single-turn function calling.[^15] In RAG citation fidelity, Command R+ surpassed GPT-4 Turbo in human evaluation. On multi-hop question answering benchmarks like HotpotQA, Bamboogle, and StrategyQA, it outperformed Claude 3 Sonnet and Mistral Large.
On the Chatbot Arena leaderboard captured on April 9, 2024, Command R+ ranked sixth overall and was the highest-ranked non-proprietary model at the time, outperforming some earlier versions of GPT-4.[^17] For translation tasks evaluated on FLoRES and WMT23, Command R+ was competitive with GPT-4 Turbo across French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, and Chinese. In the Chinese Chatbot Arena, Command R+ ranked third behind GPT-4 and Claude 3 Opus, both of which cost two to three times more.[^12]
Command R+ was named one of TIME Magazine's Best Inventions of 2024.[^18]
Like Command R, Command R+ uses an optimized autoregressive decoder-only transformer trained with supervised fine-tuning and preference training, and it also relies on grouped query attention.[^11] It supports the same 10 primary languages and 13 additional pretraining languages as Command R, and its training data extended through February 2023, which Cohere recommends pairing with RAG to handle later facts.[^19] A 4-bit quantized version was released at CohereLabs/c4ai-command-r-plus-4bit for deployment on more constrained hardware.[^3]
At release, Command R+ was priced at $3.00 per million input tokens and $15.00 per million output tokens via Cohere's hosted API.[^15] The August 2024 refresh later lowered these to $2.50/$10.00, matching GPT-4o.[^14]
On August 30, 2024, Cohere released updated versions of both models: command-r-08-2024 and command-r-plus-08-2024.[^20][^21] The refreshed Command R delivered roughly 50% higher throughput and 20% lower latencies while cutting the required hardware footprint by half compared to the previous version. The refreshed Command R+ delivered roughly 50% higher throughput and 25% lower latencies on the same hardware footprint. Key improvements applied to both refreshed models included:[^21]
API pricing for Command R+ dropped from $3.00 to $2.50 per million input tokens and from $15.00 to $10.00 per million output tokens with the refresh.[^14]
Command R7B was announced on December 13, 2024, as the smallest and fastest model in the Command R family.[^22][^23] At 7 billion parameters (8 billion in BF16 weights on disk) with a 128,000-token context window and a maximum output of 4,000 tokens, it was designed for high-throughput, latency-sensitive applications like chatbots and code assistants, and for on-device inference scenarios where larger models are impractical.[^24] The model's knowledge cutoff date is June 1, 2024.
Command R7B introduced an architectural refinement shared with the later Command A model. The transformer interleaves three layers of sliding window attention (window size of 4,096 tokens) with Rotary Position Embedding (RoPE) and one layer of global attention without positional embeddings. This hybrid attention design balances the efficiency of local attention with the ability to attend to distant context when needed.[^24]
Command R7B expanded fully supported languages to 23: the original 10 plus Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian.[^24]
On the Hugging Face Open LLM Leaderboard (v2), Command R7B ranked first on average among similarly sized open-weight models:[^24]
| Benchmark | Command R7B | Gemma 2 IT 9B | Llama 3.1 8B | Qwen 2.5 7B |
|---|---|---|---|---|
| IFEval | 77.9 | 74.4 | 78.6 | 75.85 |
| BBH | 36.1 | 42.1 | 29.9 | 34.89 |
| MATH (Hard) | 26.4 | 0.2 | 19.3 | 0.0 |
| GPQA | 7.7 | 14.8 | 2.4 | 5.48 |
| MUSR | 11.6 | 9.74 | 8.41 | 8.45 |
| MMLU-Pro | 28.5 | 32.0 | 30.7 | 36.52 |
| Average | 31.4 | 28.9 | 28.2 | 26.87 |
Command R7B scored notably well on MATH (Hard), achieving 26.4 compared to near-zero scores from Gemma 2 IT 9B and Qwen 2.5 7B. Cohere described Command R7B as "the smallest, fastest, and final model in our R family," marking the conclusion of the R product line.[^23] API pricing was set at $0.0375 per million input tokens and $0.15 per million output tokens.[^14]
On February 27, 2025, Cohere released Command R7B Arabic (c4ai-command-r7b-arabic-02-2025), an 8-billion-parameter variant optimized for Modern Standard Arabic in addition to English, with the same 128,000-token context window as the base R7B model.[^25][^26] The model targets enterprises in the MENA region and emphasizes instruction following, length control, RAG, and minimized code-switching between Arabic and English. Like other R-series releases, the weights were published on Hugging Face under CC-BY-NC 4.0.
One of the most distinctive features of the Command R family is its built-in support for grounded generation with citations. Unlike many competing models that generate text without indicating where specific claims originate, Command R models are trained to produce fine-grained citations alongside their output. This capability is not a post-processing step or a plugin; it is trained directly into the model weights.[^2][^3]
The grounded generation pipeline operates in conjunction with RAG. When a user submits a query:
Cohere offers two citation modes:[^2]
| Mode | Description | Use Case |
|---|---|---|
| Accurate | The model first generates the complete response, then produces citations mapped to specific segments of the text | Applications where citation precision matters most |
| Fast | Citations are emitted inline as the response is produced, injected at the moment the model references a source | Streaming applications where low latency is important |
The citation system enables users and automated systems to verify claims, trace information back to source documents, and identify when the model may be generating content not grounded in the provided sources. This is especially valuable in regulated industries like finance, healthcare, and legal services where factual accuracy is non-negotiable. For enterprises, inline citations reduce the risk of hallucination by making it straightforward to audit model outputs.
Command R models support multi-step tool use, allowing them to function as autonomous agents that plan and execute sequences of actions using multiple external tools. These capabilities were trained into the models through a mixture of supervised fine-tuning and preference fine-tuning using a specific prompt template.[^2]
In single-step tool use, the model receives a user query along with a list of available tools (defined by their names, descriptions, and parameter schemas). The model selects the appropriate tool, generates the required parameters in JSON, and returns the result. This covers straightforward function-calling scenarios such as looking up information from a database or calling a calculator.
Multi-step tool use extends this by allowing the model to perform several inference cycles in a loop:
This cycle repeats until the model determines it has enough information to answer the user's question. The model can call multiple tools in parallel when the calls are independent, and it can self-correct when a tool call fails, making multiple attempts to accomplish the task. The ability to recover from errors and retry with different parameters increases the overall success rate of agentic workflows.
For example, if a user asks "What is the current temperature in the capital of Brazil?", the model first calls a geographic lookup tool to determine that the capital of Brazil is Brasília, then calls a weather API to retrieve the temperature in Brasília, and finally combines both results into a coherent answer.
The following table summarizes the key specifications and pricing across the Command R family and the immediate successor:
| Model | Release Date | Parameters | Context Window | Input Price ($/1M tokens) | Output Price ($/1M tokens) | Open Weights |
|---|---|---|---|---|---|---|
| Command R | March 11, 2024 | 35B | 128K | $0.15 (08-2024 refresh) | $0.60 (08-2024 refresh) | Yes (CC-BY-NC) |
| Command R+ | April 4, 2024 | 104B | 128K | $2.50 (08-2024 refresh) | $10.00 (08-2024 refresh) | Yes (CC-BY-NC) |
| Command R7B | December 13, 2024 | 7B | 128K | $0.0375 | $0.15 | Yes (CC-BY-NC) |
| Command R7B Arabic | February 27, 2025 | 8B | 128K | n/a (research) | n/a (research) | Yes (CC-BY-NC) |
| Command A (successor) | March 13, 2025 | 111B | 256K | $2.50 | $10.00 | Yes (CC-BY-NC) |
Sources: Cohere model cards and pricing pages.[^2][^3][^14][^24][^25]
All Command R family models are released as open-weight research releases under the Creative Commons Attribution-NonCommercial 4.0 (CC-BY-NC 4.0) license, along with Cohere's Acceptable Use Policy.[^2][^3][^24] This means:
This licensing strategy allows Cohere to benefit from community feedback and academic research while maintaining control over commercial revenue. It contrasts with broadly permissive open-weight releases like Meta's Llama family (under the Llama Community License) and with fully proprietary models such as OpenAI's GPT-4 and Anthropic's Claude. The distinction between "open weights" and "open source" is important: while the model weights are publicly available, the training data, training code, and full reproduction recipe are not released.
In March 2025, Cohere released [[command_a]] (c4ai-command-a-03-2025) as the successor to the Command R series and the company's most capable model at the time of release.[^4][^5] With 111 billion parameters and a 256,000-token context window (double the context length of Command R+), it represented a generational leap in both performance and efficiency. A 55-page technical report co-authored by 228 Cohere contributors was published on arXiv as 2504.00698 in April 2025.[^27]
Key characteristics of Command A:
Cohere subsequently extended the Command A family with specialized variants:
| Model | Release | Context | Description |
|---|---|---|---|
| Command A Vision | July 31, 2025 | 128K | 112B-parameter multimodal model combining a SigLIP2 vision encoder with the Command A text tower; supports up to 20 images per request[^30] |
| Command A Reasoning | August 21, 2025 | 256K | 111B-parameter reasoning model designed to "think" before responding for customer service and complex enterprise tasks[^31] |
| Command A Translate | August 28, 2025 | 16K (8K in + 8K out) | 111B-parameter translation model covering 23 languages, with an agentic "Deep Translation" multi-step refinement workflow[^32] |
These variants share architectural lineage with Command A and are out of scope for this article. For details, see [[command_a]].
Cohere offers several deployment tiers for the Command R (and successor) model families:[^8]
| Deployment Option | Description |
|---|---|
| Cohere API (SaaS) | Managed API with per-token pricing; simplest integration path |
| Cloud AI Platforms | Access through AWS Bedrock/SageMaker, Azure AI Foundry, Google Vertex AI, Oracle OCI |
| Virtual Private Cloud (VPC) | Models deployed within the customer's own cloud environment for data isolation |
| On-Premises | Full deployment on customer-owned hardware for maximum control |
This flexibility is a core part of Cohere's enterprise pitch. Organizations in highly regulated industries (banking, healthcare, government) often require that data never leaves their controlled environment, and Cohere's deployment model accommodates this requirement. In July 2025, Cohere announced a partnership with Bell Canada to provide AI services to government and enterprise customers, with Bell Canada deploying Cohere's technology on its own data center infrastructure.
The Command R models work in conjunction with other Cohere products to form a complete enterprise AI stack:
As of 2026, Command A and its specialized variants are Cohere's flagship enterprise offerings, with the Command R family officially in maintenance and superseded for most new deployments. The Cohere blog and changelog explicitly describe Command R7B as the final model in the R series, and Command A as the next generation.[^23][^4]
Despite the supersession, Command R and Command R+ retain significance for several reasons:
For practical deployments today, Cohere generally recommends Command A (or Command A Reasoning for advanced agentic workflows, Command A Vision for multimodal use cases, and Command A Translate for translation), but Command R family models continue to be available via API and Hugging Face for projects that prefer their smaller footprint or that have already standardized on them.