Command R is a family of large language models developed by Cohere, a Canadian artificial intelligence company focused on enterprise applications. The Command R series was designed from the ground up for retrieval-augmented generation (RAG), multi-step tool use, and grounded generation with inline citations. The lineup includes Command R (35 billion parameters), Command R+ (104 billion parameters), Command R7B (7 billion parameters), and the successor model Command A (111 billion parameters). All models in the series have been released as open-weight research releases under the CC-BY-NC 4.0 license, while commercial access is available through Cohere's API and cloud deployment partners.
Cohere was founded in 2019 by Aidan Gomez, Ivan Zhang, and Nick Frosst. All three co-founders attended the University of Toronto. Gomez was notably one of the eight co-authors of the seminal 2017 paper "Attention Is All You Need," which introduced the Transformer architecture while he was a 20-year-old intern at Google Brain. Frosst also worked as a researcher at Google Brain before co-founding Cohere. Ivan Zhang had previously collaborated with Gomez on research at FOR.ai, a research collective. The company is headquartered in Toronto and San Francisco, with additional offices in Montreal, London, New York City, Paris, and Seoul.
Unlike many AI labs that target consumer-facing chatbots, Cohere has pursued an enterprise-first strategy from its earliest days. The company builds models and platform tools that help businesses integrate AI into search engines, chatbots, document processing, and workflow automation. Cohere's models are available through its own API, as well as through major cloud platforms including Amazon Web Services (via Bedrock and SageMaker), Microsoft Azure (via Azure AI Foundry), Google Cloud (via Vertex AI), and Oracle Cloud Infrastructure (via OCI Generative AI). In November 2021, Google Cloud announced it would help power Cohere's platform using its infrastructure, including Cloud TPUs for model development and deployment. This cloud-agnostic deployment philosophy, combined with support for virtual private cloud (VPC) and on-premises deployments, distinguishes Cohere in the enterprise AI market.
As of September 2025, Cohere had raised approximately $1.6 billion in total funding. A $500 million Series D round in June 2024 valued the company at $5.5 billion. A subsequent $500 million raise in August 2025 pushed the valuation to $6.8 billion, and a $100 million extension the following month brought it to $7 billion. Investors include Radical Ventures, Inovia Capital, AMD Ventures, NVIDIA, PSP Investments, and Salesforce Ventures.
In January 2025, Cohere launched North, a turnkey AI workspace platform for enterprise productivity. North allows workers to build automations, query their company data, and collaborate with AI from a secure environment. It connects to existing workplace tools like Gmail, Slack, Salesforce, and Outlook, and can be deployed on private infrastructure so that Cohere never accesses customer data. Early access customers include RBC, Dell, LG, Ensemble Health Partners, and Palantir.
Before the Command R series, Cohere offered earlier generation Command models (simply called "Command" and "Command Light") that served as general-purpose text generation models. These earlier models lacked the specialized RAG, tool use, and citation capabilities that define the R series. The Command R family represented a deliberate pivot toward optimizing models for specific enterprise workflows rather than competing on general chat benchmarks alone.
Command R was announced on March 11, 2024, as Cohere's first model purpose-built for enterprise RAG workflows and tool use at scale. With 35 billion parameters and a 128,000-token context window, Command R represented a significant shift in Cohere's model strategy: rather than chasing general-purpose chat benchmarks, the model was optimized for high-precision retrieval, grounded generation, and low-latency production deployments.
Command R uses an optimized autoregressive transformer architecture. After pretraining on a large multilingual corpus, the model was refined through supervised fine-tuning (SFT) and preference training to align its behavior with human expectations for helpfulness and safety. The model accepts text input and generates text output. It uses a proprietary chat format with special tokens for delineating turns, system prompts, and tool interactions.
Command R was optimized for strong performance across 10 key languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Simplified Chinese, and Arabic. An additional 13 languages were included in the pretraining data with lower optimization priority: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian.
Cohere's tokenizer plays an important role in multilingual performance. Unlike many tokenizers that are heavily English-centric, Cohere designed its tokenizer for cross-lingual efficiency. In comparisons with OpenAI's tokenizer, the Cohere tokenizer produces significantly fewer tokens for equivalent text in non-English languages. For Japanese, the Cohere tokenizer produces roughly 1.67 times fewer tokens than OpenAI's tokenizer. This efficiency directly reduces costs (since API pricing is per token) and allows more content to fit within the context window for non-English users.
Command R introduced several capabilities that became hallmarks of the entire model family:
directly_answer tool that allows it to abstain from calling external tools when the query can be answered from its own knowledge.To support deployment on consumer-grade hardware, Cohere released quantized versions of Command R on HuggingFace. An 8-bit quantized version using bitsandbytes is available in the main repository, and a separate 4-bit quantized version is available at CohereLabs/c4ai-command-r-v01-4bit. These quantized versions make it practical to run the 35-billion-parameter model on hardware with limited GPU memory, at the cost of some precision.
Command R+ was released on April 4, 2024, as Cohere's flagship large language model. At 104 billion parameters with a 128,000-token context window, it offered stronger reasoning, improved multilingual performance, and better results on tool use benchmarks compared to the smaller Command R.
On the HuggingFace Open LLM Leaderboard, Command R+ reported the following scores:
| Benchmark | Command R+ | DBRX Instruct | Mixtral 8x7B |
|---|---|---|---|
| ARC-Challenge | 70.99 | 68.9 | 70.1 |
| HellaSwag | 88.6 | 89.0 | 87.6 |
| MMLU | 75.7 | 73.7 | 71.4 |
| TruthfulQA | 56.3 | 66.9 | 65.0 |
| Winogrande | 85.4 | 81.8 | 81.1 |
| GSM8K | 70.7 | 66.9 | 61.1 |
| Average | 74.6 | 74.5 | 72.7 |
Beyond standard academic benchmarks, Cohere highlighted Command R+'s performance on enterprise-relevant tasks. According to Cohere's internal evaluations, Command R+ outperformed GPT-4 Turbo on the ToolTalk (Hard) benchmark for conversational tool use and on the Berkeley Function Calling Leaderboard (BFCL) for single-turn function calling. In RAG citation fidelity, Command R+ surpassed GPT-4 Turbo in human evaluation. On multi-hop question answering benchmarks like HotpotQA, Bamboogle, and StrategyQA, it outperformed Claude 3 Sonnet and Mistral Large.
On the Chatbot Arena leaderboard, Command R+ ranked among the top open-weight models as of April 2024, outperforming some versions of GPT-4 in that period. For translation tasks evaluated on FLoRES and WMT23, Command R+ was competitive with GPT-4 Turbo across French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, and Chinese. In the Chinese Chatbot Arena, Command R+ ranked third behind GPT-4 and Claude 3 Opus (both of which cost two to three times more).
Command R+ was named one of TIME Magazine's Best Inventions of 2024.
Like Command R, Command R+ uses an optimized autoregressive transformer architecture trained with supervised fine-tuning and preference training. It supports the same 10 primary languages and 13 additional pretraining languages as Command R. A 4-bit quantized version was also released at CohereLabs/c4ai-command-r-plus-4bit for deployment on more constrained hardware.
In August 2024, Cohere released updated versions of both models: command-r-08-2024 and command-r-plus-08-2024. The refreshed Command R+ delivered roughly 50% higher throughput and 25% lower latencies compared to the April version, while maintaining the same hardware footprint. Key improvements included:
The pricing for Command R+ was also reduced with this update: input tokens dropped from $3.00 to $2.50 per million, and output tokens dropped from $15.00 to $10.00 per million.
Command R7B was released on December 14, 2024, as the smallest and fastest model in the Command R family. At 7 billion parameters (8 billion in BF16 format) with a 128,000-token context window and a maximum output of 4,000 tokens, it was designed for high-throughput, latency-sensitive applications like chatbots and code assistants, and for on-device inference scenarios where larger models are impractical. The model's knowledge cutoff date is June 1, 2024.
Command R7B introduced an architectural refinement shared with the later Command A model. The transformer includes three layers of sliding window attention (with a window size of 4,096 tokens) and one layer of global attention without positional embeddings. The model uses Rotary Position Embedding (RoPE) for positional encoding. This hybrid attention design balances the efficiency of local attention with the ability to attend to distant context when needed.
Command R7B expanded language coverage to 23 languages, adding Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, and Persian as fully supported languages alongside the original 10.
On the HuggingFace Open LLM Leaderboard (v2), Command R7B ranked first among similarly sized open-weight models:
| Benchmark | Command R7B | Gemma 2 IT 9B | Llama 3.1 8B | Qwen 2.5 7B |
|---|---|---|---|---|
| IFEval | 77.9 | 74.4 | 78.6 | 75.85 |
| BBH | 36.1 | 42.1 | 29.9 | 34.89 |
| MATH (Hard) | 26.4 | 0.2 | 19.3 | 0.0 |
| GPQA | 7.7 | 14.8 | 2.4 | 5.48 |
| MUSR | 11.6 | 9.74 | 8.41 | 8.45 |
| MMLU-Pro | 28.5 | 32.0 | 30.7 | 36.52 |
| Average | 31.4 | 28.9 | 28.2 | 26.87 |
Command R7B scored notably well on MATH (Hard), achieving 26.4 compared to near-zero scores from Gemma 2 IT 9B and Qwen 2.5 7B. It also led on MUSR with 11.6. Cohere described Command R7B as the final model in the R series, marking the conclusion of that product line.
Command A was announced on March 11, 2025, as the successor to the Command R series and Cohere's most capable model at the time of release. With 111 billion parameters and a 256,000-token context window (double the context length of Command R+), it represented a generational leap in both performance and efficiency. The accompanying technical report was published on arXiv (2504.00698) in April 2025, authored by 228 contributors from the Cohere team.
Command A uses a hybrid attention architecture that combines sliding window attention (window size of 4,096 tokens) with global attention layers, building on the design introduced in Command R7B but scaled up significantly. The model employs a decentralized training approach that incorporates self-refinement algorithms and model merging techniques.
A defining feature of Command A is its hardware efficiency: despite having 111 billion parameters, the model requires only two GPUs (A100 or H100) to run. Cohere reported 150% higher throughput compared to Command R+ 08-2024. Token streaming speed for 100K-context requests reached 73 tokens per second, compared to 38 tokens per second for GPT-4o and 32 tokens per second for DeepSeek-V3, according to Cohere's benchmarks. This means Command A's throughput is approximately 1.75 times that of GPT-4o and 2.4 times that of DeepSeek-V3.
Command A supports 23 languages, matching Command R7B's expanded language set, with improved handling of Arabic dialects.
Cohere's technical report included extensive benchmarking across enterprise-relevant tasks and public benchmarks. According to the report, Command A performs on par with GPT-4o on MMLU, MBPPPlus, and SQL benchmarks. It leads on BFCL-v3 and Taubench for tool-using agents, and dominates RepoQA in long-context code understanding. In human evaluations for business, coding, and agentic tasks, Command A outperformed or matched GPT-4o and DeepSeek-V3. The 256K context window (twice the size of GPT-4o's 128K window) provides additional headroom for processing lengthy documents, extensive conversation histories, and large retrieval sets.
Command A's weights were released under the CC-BY-NC 4.0 license on HuggingFace as CohereLabs/c4ai-command-a-03-2025, consistent with Cohere's open-weight research release strategy.
Following Command A, Cohere continued expanding the Command model family with specialized variants:
| Model | Release | Context | Description |
|---|---|---|---|
| Command A Vision | July 2025 | 128K | Cohere's first multimodal model, capable of processing images alongside text |
| Command A Reasoning | August 2025 | 256K | Cohere's first reasoning model, designed to "think" before generating; built for customer service and complex tasks |
| Command A Translate | August 2025 | 8K | Specialized translation model covering 23 languages |
These specialized variants reflect a broader industry trend toward model families where a base architecture is adapted for specific modalities or reasoning styles.
One of the most distinctive features of the Command R family is its built-in support for grounded generation with citations. Unlike many competing models that generate text without indicating where specific claims originate, Command R models are trained to produce fine-grained citations alongside their output. This capability is not a post-processing step or a plugin; it is trained directly into the model weights.
The grounded generation pipeline operates in conjunction with RAG. When a user submits a query:
Cohere offers two citation modes:
| Mode | Description | Use Case |
|---|---|---|
| Accurate | The model generates the complete response first, then produces citations that map to specific segments of the text | Applications where citation precision matters most |
| Fast | Citations are generated inline as the response is produced, injected at the exact moment the model references a source | Streaming applications where low latency is important |
The citation system enables users and automated systems to verify claims, trace information back to source documents, and identify when the model may be generating content not grounded in the provided sources. This is especially valuable in regulated industries like finance, healthcare, and legal services where factual accuracy is non-negotiable. For enterprises, inline citations reduce the risk of hallucination by making it straightforward to audit model outputs.
Command R models support multi-step tool use, which allows them to function as autonomous agents that can plan and execute sequences of actions using multiple external tools. These capabilities were trained into the models through a mixture of supervised fine-tuning and preference fine-tuning using a specific prompt template.
In single-step tool use, the model receives a user query along with a list of available tools (defined by their names, descriptions, and parameter schemas). The model selects the appropriate tool, generates the required parameters in JSON format, and returns the result. This covers straightforward function-calling scenarios such as looking up information from a database or calling a calculator.
Multi-step tool use extends this by allowing the model to perform several inference cycles in a loop:
This cycle repeats until the model determines it has enough information to answer the user's question. The model can call multiple tools in parallel when the calls are independent, and it can self-correct when a tool call fails, making multiple attempts to accomplish the task. The ability to recover from errors and retry with different parameters increases the overall success rate of agentic workflows.
For example, if a user asks "What is the current temperature in the capital of Brazil?", the model first calls a geographic lookup tool to determine that the capital of Brazil is Brasilia, then calls a weather API to retrieve the temperature in Brasilia, and finally combines both results into a coherent answer.
The following table summarizes the key specifications and pricing across the Command R family and selected competing models:
| Model | Release Date | Parameters | Context Window | Input Price (per 1M tokens) | Output Price (per 1M tokens) | Open Weights |
|---|---|---|---|---|---|---|
| Command R | March 2024 | 35B | 128K | $0.15 | $0.60 | Yes (CC-BY-NC) |
| Command R+ (08-2024) | August 2024 | 104B | 128K | $2.50 | $10.00 | Yes (CC-BY-NC) |
| Command R7B | December 2024 | 7B | 128K | $0.0375 | $0.15 | Yes (CC-BY-NC) |
| Command A | March 2025 | 111B | 256K | $2.50 | $10.00 | Yes (CC-BY-NC) |
| GPT-4o | May 2024 | Undisclosed | 128K | $2.50 | $10.00 | No |
| Claude 3.5 Sonnet | June 2024 | Undisclosed | 200K | $3.00 | $15.00 | No |
| Gemini 1.5 Pro | February 2024 | Undisclosed | 1M | $1.25 | $5.00 | No |
| Llama 3.1 405B | July 2024 | 405B | 128K | Varies by provider | Varies by provider | Yes (Llama License) |
Pricing reflects API rates at the time of each model's latest version and may vary by provider or volume tier.
A notable pattern in this comparison is that Command A offers the same pricing as GPT-4o ($2.50 input / $10.00 output per million tokens) while providing a 256K context window, open weights, and the ability to self-host. This positions it as a competitive alternative for enterprises that want GPT-4o-level performance with more flexible deployment options.
All Command R family models are released as open-weight research releases under the CC-BY-NC 4.0 (Creative Commons Attribution-NonCommercial) license, along with Cohere's Acceptable Use Policy. This means:
This licensing strategy allows Cohere to benefit from community feedback and academic research while maintaining control over commercial revenue. It contrasts with fully permissive open-source releases (like Meta's Llama models under the Llama License) and fully proprietary models (like OpenAI's GPT-4 and Anthropic's Claude). The distinction between "open weights" and "open source" is important: while the model weights are publicly available for download, the training data, training code, and full reproduction recipe are not released.
Cohere offers several deployment tiers for the Command model family:
| Deployment Option | Description |
|---|---|
| Cohere API (SaaS) | Managed API with per-token pricing; simplest integration path |
| Cloud AI Platforms | Access through AWS Bedrock/SageMaker, Azure AI Foundry, Google Vertex AI, Oracle OCI |
| Virtual Private Cloud (VPC) | Models deployed within the customer's own cloud environment for data isolation |
| On-Premises | Full deployment on customer-owned hardware for maximum control |
This flexibility is a core part of Cohere's enterprise pitch. Organizations in highly regulated industries (banking, healthcare, government) often require that data never leaves their controlled environment, and Cohere's deployment model accommodates this requirement. In July 2025, Cohere announced a partnership with Bell Canada to provide AI services to government and enterprise customers, with Bell Canada deploying Cohere's technology on its own data center infrastructure.
The Command R models work in conjunction with other Cohere products to form a complete enterprise AI stack:
The Command R family is notable for several reasons in the broader landscape of large language models: