Reka Flash

AI Models Large Language Models Multimodal AI Open Source AI

14 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

10 citations

Revision

v2 · 2,866 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Reka Flash is a family of multimodal large language models developed by Reka AI, a San Francisco Bay Area research company founded in 2022 by former researchers from Google DeepMind, Meta FAIR, and Google.^[1] The Flash line sits in the middle of Reka's three-tier model series, positioned between the larger Reka Core and the compact Reka Edge. The original Reka Flash, introduced in February 2024, was a 21 billion parameter model designed to process text, images, video, and audio inputs while running at lower cost than frontier models of the time.^[2]

The series became more widely known in March 2025, when Reka released Reka Flash 3, a 21 billion parameter reasoning model published under the Apache 2.0 license on Hugging Face.^[4]^[5] Reka Flash 3 was the company's first fully open weights release and was positioned as a general-purpose reasoning model competitive with OpenAI's o1-mini at a fraction of the deployment cost.^[4] The release made Reka one of a small number of frontier-focused labs to publish open weight reasoning models in early 2025, alongside DeepSeek, Qwen, and Mistral AI.

Background

The Reka family

Reka AI announced its model lineup in stages through 2023 and 2024. The company emerged from stealth in June 2023 with $58 million in funding from DST Global Partners, Radical Ventures, and Snowflake Ventures, and a pitch focused on building efficient, enterprise-deployable multimodal models from scratch. Its first public model, Yasa-1, shipped in October 2023 as a multimodal assistant capable of processing images, audio, and short video clips alongside text.

In February 2024 Reka rolled out a structured family of three models intended to cover different performance and cost points.^[2] Reka Edge, at roughly 7 billion parameters, targeted on-device and resource-constrained deployments. Reka Flash, at 21 billion parameters, served as the workhorse model for cost-sensitive production workloads. Reka Core, the largest model in the series, was designed to compete with frontier-class systems such as GPT-4 and Claude 3 Opus on multimodal benchmarks.^[10] All three were trained from scratch rather than fine-tuned from a third party base, which the company emphasized as a differentiator from labs that relied on Llama or Mistral checkpoints.

Reka Flash original release

The original Reka Flash entered public beta on February 12, 2024 via the Reka Playground. At launch the model accepted text and images, with video and audio support arriving over the following months. Reka described Flash as a "turbo-class" model trained on approximately 4.5 trillion deduplicated and filtered language tokens spanning more than 32 languages, including English, Chinese, Japanese, Spanish, Arabic, and Hindi. The standard context length at release was 8,000 tokens, with a 128,000 token long-context variant added later for retrieval and long-document tasks.^[2]

Reka published headline benchmark results showing Flash outperforming Gemini Pro 1.0 on the MMLU and GPQA evaluations and reaching competitive scores on GSM8K and HumanEval. On multimodal evaluations including MMMU, VQA-v2, VATEX video captioning, and Perception Test video question answering, Flash was reported as competitive with Gemini Pro across all four benchmarks. In a blind text chat human evaluation Flash placed ahead of GPT-3.5 Turbo, Claude 2.1, Mixtral 8x7B, and Gemini Pro, and in multimodal chat human evaluation it ranked second only to GPT-4V.^[2]

The technical details for Flash, Core, and Edge were consolidated in a single arXiv paper, Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models, posted on April 18, 2024 (arXiv:2404.12387).^[1] The paper was authored by the 25-person Reka team and submitted by researcher Max Bain. It described the shared training pipeline, the multimodal evaluation methodology, and ablations on data mix decisions, although Reka did not disclose full architectural details such as the exact number of attention heads or hidden dimensions.^[1]

Reka Flash updates (October 2024)

On October 4, 2024 Reka shipped a major update to Reka Flash that the company referred to internally as Flash v1.5. The update raised the model's quality score from 66.1 percent to 72.2 percent on internal evaluations and added a 43-point gain in internal Elo rating. On the public LMSYS Chatbot Arena leaderboard Reka Flash climbed from an Elo of 1148 to 1204, a 56-point gain.^[3]

The October release expanded multimodal coverage in several ways. Image inputs gained better OCR and support for arbitrary resolutions and aspect ratios. Video inputs grew from one minute to three to five minutes per clip and gained native audio understanding rather than a separate transcription pass. Speech became a first-class input modality, and an experimental English speech output mode was added. Reka also positioned the new Flash as an agent backbone, introducing function calling and structured output, lifting output-format instruction accuracy from 40.4 percent to 83.6 percent, and reporting a 51.8 percent score on its internal MegaTask agent benchmark compared to 40.4 percent for Gemini 1.5 Flash and 25.9 percent for GPT-4o mini. The update was deployed via Reka Chat, the Reka API, and an NVIDIA NIM microservice in partnership with Nvidia.^[3]

Reka Flash 3

Reka Flash 3 was released on March 10, 2025 and represented a substantial reorientation of the Flash brand.^[4] The original Flash had been a closed-weights multimodal model offered through Reka's hosted API. Reka Flash 3 was instead a text-only reasoning model published as open weights on Hugging Face under the Apache 2.0 license, with full model files available for download, fine-tuning, and self-hosting.^[4]^[5]

Model design

Reka Flash 3 keeps the 21 billion parameter scale of the original Flash line but is described in the release notes as having been trained from scratch as a reasoning-focused successor rather than a continuation of the v1.5 multimodal weights.^[4] The model targets a budget point of 35 percent fewer parameters than Qwen QwQ-32B, which Reka identified as the closest open weight reasoning peer at release.^[4] At full BF16 precision the checkpoint occupies 39 GB on disk, and Reka shipped guidance for 4-bit quantization that compresses the model to roughly 11 GB while preserving most reasoning performance, compared with about 18 GB minimum for QwQ-32B.^[5]

The context window is 32,000 tokens. The tokenizer is OpenAI's cl100k_base without any added special tokens, which simplifies integration with tools that already understand that vocabulary. The model uses a chat template based on human: and assistant: turns separated by a <sep> token, and generation stops on <sep> or <|endoftext|>. System prompts are prepended to the first user turn rather than carried as a separate role marker. Reka also published the model in a Llama-compatible weight layout so that downstream tooling such as Hugging Face Transformers and vLLM can load it without custom code paths.^[5]

Training

Reka described the training pipeline as a three-stage process. The first stage was large-scale pretraining on a mix of public web data and curated synthetic datasets. The second stage was supervised instruction tuning on Reka-authored and filtered third-party instruction data. The third stage applied reinforcement learning using REINFORCE Leave-One-Out (RLOO) with a combination of model-based reward models and rule-based reward signals, with what Reka described as a deliberate focus on general reasoning improvements rather than specializing the model on any single domain such as competition math or code. The training data was largely English with some multilingual coverage.^[4]

The most novel design choice was the budget forcing mechanism, a built-in pair of <reasoning> and </reasoning> tags that delimit chain-of-thought output. Users or downstream applications can stop the model after a chosen number of reasoning tokens, force it to close its reasoning trace, and immediately produce a final answer. This is intended to give application builders explicit control over the latency and cost of reasoning without retraining, and complements the trend toward inference-time scaling pioneered by OpenAI o1 and DeepSeek's R1.^[4]

Architecture

Reka has not published a full architecture diagram for either Reka Flash or Reka Flash 3. The April 2024 technical report describes the family at a high level as decoder-only transformer language models with a paired vision encoder for image and video frames, training jointly on text and visual tokens.^[1] The vision pathway accepts images at arbitrary resolution, with each image converted to a sequence of patch tokens that are interleaved with text tokens in the model's input.

For Reka Flash 3, the Hugging Face model card lists the architecture as Llama-compatible at the weight format level, which implies the same general decoder-only transformer layout with RoPE positional embeddings, grouped-query attention, and SwiGLU feedforward blocks used by Llama and related families.^[5] Reka has not confirmed the exact number of layers, attention heads, or hidden dimension. The 21 billion parameter scale is similar to other mid-size reasoning models such as Qwen 32B and slightly larger than Gemma 27B.

Benchmark performance

Original Reka Flash (February 2024)

Reported scores for the original Reka Flash from the Reka Core, Flash, and Edge technical report^[1]:

Benchmark	Score	Domain
MMLU	75.9	General knowledge
GSM8K	85.8	Grade school math
HumanEval	72.0	Python coding
GPQA	34.0	Graduate science QA
MMMU	53.3	Multimodal college-level QA
VQA-v2	78.4	Visual question answering
Multimodal chat Elo	1082	Blind human eval

The technical report contextualized these numbers by showing that Flash outperformed several substantially larger models on equivalent evaluations, including Llama 2 70B, Grok-1, and Mistral Medium, while running closer to Gemini Pro 1.0 in cost.^[1]

Reka Flash 3 (March 2025)

Third-party benchmark coverage of Reka Flash 3 reported the following numbers^[6]:

Benchmark	Score	Domain
AIME 2024	51.0	Competition math
LiveCodeBench	43.5	Coding
MMLU-Pro	65.0 to 66.9	General knowledge (harder)
WMT'23	83.2 COMET	Multilingual translation
Intelligence Index (Artificial Analysis)	10	Composite

Reka itself noted in the release blog post that Reka Flash 3 was "not the best model for knowledge-intensive tasks" and recommended pairing it with web search or retrieval systems for factual questions.^[4] The model performed best on reasoning-heavy benchmarks where the budget forcing mechanism could be tuned to allow longer chains of thought.

Artificial Analysis ranked Reka Flash 3 at position 101 of 125 evaluated models on its composite intelligence index as of mid-2025, with a median score of 15 for that cohort. The same analysis flagged that hosted pricing of $0.20 per million input tokens and $0.80 per million output tokens on Reka's own API made the model relatively expensive compared with other open weight models of similar size, although self-hosted inference removed that comparison.^[6]

Open weights and licensing

Reka Flash 3 was the first Reka model published with downloadable weights. The model card on Hugging Face lists the license as Apache 2.0, which permits commercial use, modification, and redistribution without per-token royalties or usage restrictions.^[5] The release also made clear that the checkpoint is suitable for fine-tuning and that derivative models can be released under different licenses.

The choice of Apache 2.0 placed Reka Flash 3 in the same licensing tier as Mistral 7B, Falcon, and OLMo, rather than the more restrictive Llama 2 community license or the custom DeepSeek and Qwen licenses that include export and use clauses. For developers and research labs the practical effect is that Reka Flash 3 can be deployed in commercial products with minimal legal review.

The original Reka Flash and its October 2024 update remain closed weights and are accessible only through Reka's hosted API, the Reka Chat product, and the NVIDIA NIM partnership.^[3] Reka has not indicated whether the multimodal Flash weights will be opened in the future.

Deployment footprint

Deployment guidance for Reka Flash 3 lists three common operating points^[5]:

Configuration	Memory	Use case
BF16 full precision	39 GB	Single A100 80GB or two A100 40GB
8-bit quantization	~22 GB	Single A100 40GB
4-bit quantization	11 GB	Single consumer GPU (RTX 4090, L40S)

Reka has confirmed compatibility with vLLM, Hugging Face Transformers, and llama.cpp via GGUF conversions community members have published. The model has also been served through inference providers including Fireworks AI, Together AI, and DeepInfra.^[5]

Comparison to peers

Model	Parameters	Weights	License	Multimodal	Reasoning mode	Context
Reka Flash (Feb 2024)	21B	Closed	Reka API	Text, image, video, audio	No	8K, 128K long
Reka Flash 3 (Mar 2025)	21B	Open	Apache 2.0	Text only	Yes (budget forcing)	32K
GPT-4o mini	Undisclosed	Closed	OpenAI API	Text, image, audio	No	128K
Claude 3 Haiku	Undisclosed	Closed	Anthropic API	Text, image	No	200K
Qwen QwQ-32B	32B	Open	Apache 2.0	Text only	Yes	32K
DeepSeek R1-Distill-Qwen-32B	32B	Open	MIT	Text only	Yes	128K

A few specific notes on the comparison. GPT-4o mini and Claude 3 Haiku do not disclose parameter counts, so direct size comparisons are not possible; they are listed here because Reka's own marketing positioned the original Reka Flash against them. Reka Flash 3 is smaller than Qwen QwQ-32B by 11 billion parameters and ships with broadly similar reasoning performance, which was the headline efficiency claim at release.^[4] The DeepSeek R1 distillation models occupy the same open weight reasoning niche and competed directly with Reka Flash 3 in benchmark coverage during spring 2025.

Reka Flash 3 is text-only at release. Developers who want a multimodal open weight model in the same size class generally turn to Qwen2-VL, Llama 3.2 Vision, or the Pixtral models from Mistral.

Reception

Responses to the original Reka Flash in early 2024 were measured. The model received favorable coverage on technical AI blogs and from MarkTechPost and VentureBeat, which highlighted the strong benchmark numbers for a model in the 21 billion parameter range.^[8]^[10] Reviewers noted that the multimodal coverage of image, video, and audio in a single model of that size was uncommon at the time, with Gemini Pro 1.0 being the most direct comparison point. The closed-weights distribution and reliance on Reka's API limited independent verification of the published benchmarks.

Reka Flash 3 attracted more discussion in March 2025, in part because the open weights made independent evaluation straightforward. Coverage on MarkTechPost, DigiAlps, and several Medium technical posts emphasized the budget forcing mechanism as a practical feature for production deployments.^[9] The Hugging Face community produced quantizations within days of release and integrated the model into common inference stacks.

Critical reactions focused on three points. First, the model's MMLU-Pro score of around 65 to 67 was below the leading open weight reasoning models on knowledge-heavy benchmarks, and Reka itself acknowledged this limitation.^[4] Second, the text-only scope was a step back from the original Flash's multimodal capability, which some observers considered a strategic retreat. Third, the 32,000 token context window was shorter than the 128,000 or longer windows offered by several peers in 2025, which limited use for long-document analysis without retrieval augmentation.

For Reka the release served a different strategic purpose than chasing top benchmark scores. The company had spent 2023 and 2024 building a closed-source API business, and the Apache 2.0 release of Reka Flash 3 broadened developer awareness of the Reka stack ahead of the company's July 2025 funding round, in which Reka raised $110 million at a valuation above one billion dollars led by Nvidia and Snowflake. By mid-2025 Reka Flash 3 had become a commonly cited reference point for sub-30 billion parameter open weight reasoning models, alongside the DeepSeek distillations and Qwen QwQ.

References

Ormazabal, A., Zheng, C., Bain, M., et al. (April 18, 2024). "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models." arXiv:2404.12387. https://arxiv.org/abs/2404.12387 ↩
Reka AI. (February 12, 2024). "Reka Flash: Efficient and Capable Multimodal Language Models." Reka blog. https://reka.ai/news/reka-flash-efficient-and-capable-multimodal-language-models ↩
Reka AI. (October 4, 2024). "Reka Flash Updates: Advanced Multimodal Understanding, Improved Reasoning, Better Agent Building Blocks and more." Reka blog. https://reka.ai/news/reka-flash-updates ↩
Reka AI. (March 10, 2025). "Reasoning with Reka Flash 3." Reka blog. https://reka.ai/news/introducing-reka-flash ↩
Reka AI. "RekaAI/reka-flash-3." Hugging Face model card. https://huggingface.co/RekaAI/reka-flash-3 ↩
Artificial Analysis. "Reka Flash 3: Intelligence, Performance & Price Analysis." https://artificialanalysis.ai/models/reka-flash-3 ↩
Artificial Analysis. "Reka Flash: Intelligence, Performance & Price Analysis." https://artificialanalysis.ai/models/reka-flash
MarkTechPost. (February 28, 2024). "Reka AI Releases Reka Flash: An Efficient and Capable State-of-the-Art 21B Multimodal Language Model." ↩
MarkTechPost. (March 11, 2025). "Reka AI Open Sourced Reka Flash 3: A 21B General-Purpose Reasoning Model that was Trained from Scratch." ↩
VentureBeat. (April 2024). "Reka releases Reka Core, its multimodal language model to rival GPT-4 and Claude 3 Opus." ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Reka Core Reka Edge

Background

The Reka family

Reka Flash original release

Reka Flash updates (October 2024)

Reka Flash 3

Model design

Training

Architecture

Benchmark performance

Original Reka Flash (February 2024)

Reka Flash 3 (March 2025)

Open weights and licensing

Deployment footprint

Comparison to peers

Reception

See also

References

Improve this article

Related Articles

Llama 3.2

Gemma 3

Pixtral

Llama 4 Scout and Maverick

SmolVLA

Molmo

What links here

Related Articles

Llama 3.2

Gemma 3

Pixtral

Llama 4 Scout and Maverick

SmolVLA

Molmo

What links here