Reka Edge

AI Models Large Language Models Multimodal AI Small Language Models

13 min read

Updated May 30, 2026

Suggest edit History Talk

RawGraph

Last edited

May 30, 2026

Fact-checked

In review queue

Sources

7 citations

Revision

v2 · 2,582 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Reka Edge is a 7-billion-parameter multimodal language model developed by Reka AI, introduced in April 2024 as the smallest member of the company's first publicly described model family. The model was released alongside two larger siblings, Reka Core and Reka Flash, and was positioned as the option intended for local deployment and latency-sensitive applications. Unlike most language models in its size class at the time, Reka Edge accepts image, video, and audio inputs in addition to text, although it produces text-only outputs.

The name was reused in 2026 for a redesigned 7-billion-parameter vision-language model aimed at "physical AI" and on-device deployment, with new weights published on Hugging Face in March 2026 and a formal announcement on May 29, 2026. The 2026 generation is built on a different architecture from the 2024 model and is described separately below. Unless otherwise noted, the technical details in the sections that follow refer to the original 2024 release.

The original model was first detailed in the technical report "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models," submitted to arXiv on April 18, 2024. Reka described Edge as a dense transformer built on the same backbone architecture used across the rest of the family, scaled down and trained on roughly 4.5 trillion tokens. On standard text benchmarks the model outperformed Meta's Llama 2 7B and Mistral AI's Mistral 7B across every evaluation reported in the original paper, and beat Google's Gemma 7B on most though not all of them.

Background

Reka AI was founded in 2022 by researchers previously at Google DeepMind, Meta FAIR, and Google, and emerged from stealth in mid-2023 with a focus on building multimodal foundation models that could be deployed on-premises or in customer-controlled environments. The company's first publicly released product was Yasa-1, a multimodal assistant introduced in October 2023. The Core, Flash, and Edge family followed in April 2024 and represented the first time Reka described an internal model lineup spanning multiple size tiers.

The three models share the same training recipe and backbone, and differ primarily in parameter count and the volume of training compute applied. Reka Core is the flagship, sized at "frontier-class" scale and competitive with closed models from larger labs. Reka Flash is a 21-billion-parameter mid-tier model aimed at most general workloads. Reka Edge sits at the bottom of the lineup as a compact 7B option designed for deployments where latency or hardware footprint matters more than absolute capability. The technical report frames the family as a deliberate compute-class strategy rather than a single flagship product.

Why a small model

The 7B parameter class became a focal point of open-weight model releases in 2023 and 2024 because models of that size can run on a single consumer or workstation GPU, fit on a high-end laptop with quantization, and in some cases run on phones or edge devices. Reka explicitly framed Edge around that deployment envelope. Most 7B models at the time were text-only; Reka Edge was unusual in carrying full multimodal input support down to that size.

Architecture

Reka Edge is a 7-billion-parameter dense transformer. Reka does not publish the exact layer count, hidden width, or head configuration in the public report, but it describes the backbone in terms of the design choices it shares with other modern decoder transformers.

Backbone

The backbone is referred to internally as a "Noam" architecture, after a common reference design that combines several now-standard components:

Component	Choice
Activation	SwiGLU
Attention	Grouped Query Attention (GQA)
Position encoding	Rotary Position Embeddings (RoPE)
Normalization	RMSNorm
Tokenizer	SentencePiece, 100K vocabulary, based on tiktoken

This is broadly similar to PaLM but without the parallel attention and feed-forward layers used in some Google designs. The use of grouped query attention reduces the memory cost of the key-value cache, which matters more in small models intended for local inference where memory is often the binding constraint.

Multimodal stack

The model uses a modular encoder-decoder structure. Image, video, and audio inputs are processed by dedicated encoders whose outputs are projected into the language model's input space. The decoder is the language backbone described above. The same general approach is used across Core, Flash, and Edge, with differences mostly in encoder capacity and training mix.

Context length

Reka Edge ships with an 8K-token context window in its standard form, with a long-context variant trained out to 64K tokens. The 8K default is roughly in line with most 7B contemporaries from the same period.

Training

The technical report states that Edge was trained on approximately 4.5 trillion language tokens after extensive deduplication and filtering. Training was done predominantly on Nvidia H100 GPUs using PyTorch, across several hundred chips over several weeks. The knowledge cutoff for the released checkpoint is November 2023. After pretraining the model went through instruction tuning followed by reinforcement learning from human feedback using proximal policy optimization (PPO).

Benchmark performance

Reka reported a fairly broad sweep of text and multimodal benchmarks for Edge in the original technical report. The text-only numbers are the cleanest comparison points because most peer 7B models at the time published the same evaluations.

Reported scores on standard text benchmarks

Benchmark	Reka Edge	Mistral 7B	Gemma 7B
MMLU	65.7	62.5	64.3
GSM8K	66.2	35.4	46.4
HumanEval	54.3	26.2	32.3
XQuAD	54.2	29.7	21.7
TydiQA	61.5	31.7	35.8
Belebele	37.1	32.8	26.8

Reka reported that Edge beats both Mistral 7B and Llama 2 7B on all eight benchmarks evaluated in the paper, and beats Gemma 7B on every benchmark except MATH, where Gemma scored 24.3 against Edge's 23.2. The largest reported margins are on coding (HumanEval) and grade-school math word problems (GSM8K), where Edge's scores roughly double those of Mistral 7B in the same evaluation setup.

Multimodal evaluation

Multimodal evaluation in the report is reported partly as benchmark scores and partly as head-to-head chat comparisons. In the multimodal chat evaluation Edge scored an Elo of 986 with a 50.5% win rate, outperforming IDEFICS 80B and Adept Fuyu 8B by a large margin. The Reka Flash announcement post separately reported that Edge "outperforms LLaVA 1.6 7B" in multimodal chat and "approaches the performance of Gemini Pro," though those latter claims rest on Reka's own head-to-head methodology rather than a public benchmark.

Chat quality

On MT-Bench, a single-turn chat evaluation that uses a strong LLM judge, Edge scored 7.6, which Reka described as competitive with the best models of similar size at the time. In Reka's internal text-only chat evaluations the model ranked ahead of other 7B systems and was described as coming close to Anthropic's Claude Instant 1.2, a notably stronger comparison than the open-weight peers it shipped against.

On-device positioning

Reka's framing of Edge centers on three deployment scenarios: latency-sensitive applications, on-premises deployments where data cannot leave a customer environment, and edge or device-side inference where the model needs to run with constrained hardware. The technical report explicitly describes Edge as "designed for local deployments and latency sensitive applications," which is consistent with how Reka markets the rest of its enterprise offering.

For on-device usage the 7B size is meaningful in two ways. First, at FP16 the raw weights fit in roughly 14GB of memory, which puts them within reach of a single workstation GPU and, with 4-bit quantization, within reach of many consumer GPUs and recent Apple Silicon laptops. Second, the use of grouped query attention reduces KV cache pressure during long-context generation, which matters more on memory-constrained devices than on a server.

The model is also explicitly multimodal at this size, which is the more distinctive part of the positioning. Most 7B-class models released in 2024 were text-only or had vision added as an afterthought through a separately tuned variant. Edge carries image, video, and audio understanding in its base form, which makes it usable for tasks like local document analysis, video summarization on a workstation, or voice-driven interfaces on hardware that cannot reach back to a server.

Access at launch was through the chat.reka.ai playground and the platform.reka.ai API. Reka has also described on-premises and virtual private cloud deployment options as part of its general enterprise offering, with Edge being the smallest model available through those channels.

2026 generation

In 2026 Reka reused the Reka Edge name for a redesigned 7-billion-parameter vision-language model built specifically for "physical AI" and on-device deployment. The weights were published on Hugging Face under the identifier reka-edge-2603 in March 2026, and Reka published a formal announcement, titled "Reka Edge: Frontier-Level Edge Intelligence for Physical AI," on May 29, 2026, with the accompanying write-up dated March 11, 2026.^[6]^[7] Unlike the 2024 model, the 2026 generation is documented as accepting image and video inputs alongside text, with no mention of audio in its model card.^[7]

Architecture and efficiency

The 2026 model pairs a 657M-parameter ConvNeXt V2 vision encoder with a 6.4-billion-parameter transformer language backbone that Reka states was trained from scratch for reasoning and generation, for roughly 7B parameters in total.^[6] On Hugging Face the model uses a custom architecture class, Yasa2ForConditionalGeneration, and is distributed in BF16.^[7] Reka's design emphasis is token efficiency: the model is built to emit only 64 tokens per image tile, which the company says lets it represent a 1024x1024 image in roughly three times fewer tokens than comparable models, and Reka reports throughput of about 5.46 images per second on its reference hardware, which it presents as roughly twice that of Cosmos Reason2 8B and Qwen3.5 9B.^[6] Reka reports a time to first token of about 0.522 seconds.^[6]^[7]

Reka also describes aggressive quantization. The company states that 4-bit quantization reduces the model's memory footprint from about 13GB to about 5GB while retaining over 98% of its original performance and raising throughput by up to roughly 2.3 times, and that a 3.5-bit option is available through its "Reka Quant" method.^[6] These figures are Reka's own.

Reported capabilities

The 2026 Reka Edge is documented as supporting image understanding, video analysis, object detection with bounding boxes, spatial reasoning, and tool use, and as able to run fully offline.^[6]^[7] Reka's model card publishes the following benchmark figures, comparing Reka Edge against two similarly sized models and against the larger Gemini 3 Pro evaluated through an API; all of these numbers are vendor-reported:^[7]

Benchmark	Reka Edge	Cosmos-Reason2 8B	Qwen 3.5 9B	Gemini 3 Pro
VQA-v2	88.40	79.82	83.22	89.78
MLVU (video)	74.30	37.85	52.39	80.68
MMVU (multimodal video)	71.68	51.52	68.64	78.88
RefCOCO-A (detection)	93.13	90.98	93.62	81.46
RefCOCO-B (detection)	86.70	85.74	88.83	82.85
Mobile Actions (tool use)	88.40	77.94	91.78	89.39

Deployment and licensing

Reka positions the 2026 Edge for use cases such as robotics, drones, autonomous vehicles and automotive systems, cameras, augmented reality and wearables, and public safety, and lists target platforms including Nvidia Jetson, Apple Silicon, Linux, Windows, and Android on Snapdragon hardware.^[6] Access is offered through the Reka playground, an API, the Hugging Face weights, and vLLM. The model is released under a Business Source License (BSL) 1.1, which Reka pairs with a free commercial-use grant for organizations under 1 million US dollars in annual revenue and requires a separate commercial license above that threshold.^[6]^[7]

Comparison to peers

Reka Edge's most direct peers at release were the open-weight 7B-class language models popularized through 2023 and into 2024. By late 2024 and into 2025, that comparison set widened to include later compact models from Microsoft and Alibaba.

Original comparison set (April 2024)

Model	Parameters	Multimodal inputs	MMLU
Reka Edge	7B	Text, image, video, audio	65.7
Mistral 7B	7B	Text only	62.5
Gemma 7B	7B	Text only	64.3
Llama 2 7B	7B	Text only	~46

The most notable structural difference here is the input modality. Edge is the only entry on this list that ingests video or audio natively at the 7B scale.

Later 7B-class comparisons

By the time later compact models reached the market the framing shifted slightly. Microsoft's Phi-3 mini, released in April 2024 close to Edge, is a 3.8B parameter model that competes on text benchmarks at lower compute. Reported MMLU for Phi-3 mini is 68.8, modestly ahead of Edge's 65.7, but Phi-3 mini is text-only in its base form and substantially smaller. Alibaba's Qwen 2.5 7B, released in September 2024, posted higher scores than Edge on most public text benchmarks but does not include audio or video as base modalities, and arrived more than a year after Edge's training cutoff.

The practical takeaway is that Reka Edge's pitch was never purely about scoring the highest MMLU at 7B. The pitch was that a model at this size could deliver competitive text performance and run multimodal inputs natively, in environments where larger models would not fit.

Reception

Reka Edge received modest but generally positive coverage on release, with most attention going to the larger Reka Core model that anchored the same announcement. Coverage in trade publications noted that a small startup had managed to publish a 7B model that outperformed Mistral 7B and Llama 2 7B on the standard set of evaluations, and that the model carried multimodal inputs natively rather than as a bolt-on. The Reka Flash announcement post highlighted Edge's MT-Bench score and its strong showing against larger multimodal baselines in the chat evaluation.

In the broader 7B model landscape Edge has remained a comparatively niche option. Open-weight checkpoints and developer mindshare have tended to cluster around Mistral, Meta's Llama line, Google's Gemma, and Alibaba's Qwen, all of which ship under permissive or open weights with large external developer communities. Reka has continued to operate primarily as a closed-API and enterprise-deployment company, which limits how broadly Edge gets used in third-party fine-tunes or research projects.

Reka has continued to update the family, releasing the redesigned 2026 vision-language model described above through Hugging Face under a source-available license. The architectural choices in the original release, dense 7B, modular multimodal encoders, GQA, RoPE, RMSNorm, have remained typical of small multimodal models released since.

References

Ormazabal, Aitor et al. "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models." arXiv:2404.12387, April 18, 2024. https://arxiv.org/abs/2404.12387 Accessed 2026-05-31.
Reka AI. "Reka Flash: Efficient and Capable Multimodal Language Models." Reka blog. https://reka.ai/news/reka-flash-efficient-and-capable-multimodal-language-models Accessed 2026-05-31.
Reka AI. "Announcing the Latest Addition to Our Leading Multimodal Models." Press release, April 15, 2024. https://publications.reka.ai/reka-core-press-release.pdf Accessed 2026-05-31.
Reka AI. "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models." Technical report. https://publications.reka.ai/reka-core-tech-report.pdf Accessed 2026-05-31.
RekaAI. "reka-edge-2603." Hugging Face model card. https://huggingface.co/RekaAI/reka-edge-2603 Accessed 2026-05-31.
Reka AI. "Reka Edge: Frontier-Level Edge Intelligence for Physical AI." Reka blog, published May 29, 2026. https://reka.ai/news/reka-edge-frontier-level-edge-intelligence-for-physical-ai Accessed 2026-05-31. ↩
Reka AI. "Reka Edge." Product page. https://reka.ai/reka-edge Accessed 2026-05-31. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Reka Core Reka Flash

Background

Why a small model

Architecture

Backbone

Multimodal stack

Context length

Training

Benchmark performance

Reported scores on standard text benchmarks

Multimodal evaluation

Chat quality

On-device positioning

2026 generation

Architecture and efficiency

Reported capabilities

Deployment and licensing

Comparison to peers

Original comparison set (April 2024)

Later 7B-class comparisons

Reception

See also

References

Improve this article

Related Articles

Gemma 3

Phi-3

Phi-4

Gemma 2

Apple Foundation Models

Phi-4-mini

What links here

Related Articles

Gemma 3

Phi-3

Phi-4

Gemma 2

Apple Foundation Models

Phi-4-mini

What links here