Reka Edge
Last reviewed
May 30, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 ยท 2,582 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 30, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 ยท 2,582 words
Add missing citations, update stale details, or suggest a clearer explanation.
Reka Edge is a 7-billion-parameter multimodal language model developed by Reka AI, introduced in April 2024 as the smallest member of the company's first publicly described model family. The model was released alongside two larger siblings, Reka Core and Reka Flash, and was positioned as the option intended for local deployment and latency-sensitive applications. Unlike most language models in its size class at the time, Reka Edge accepts image, video, and audio inputs in addition to text, although it produces text-only outputs.
The name was reused in 2026 for a redesigned 7-billion-parameter vision-language model aimed at "physical AI" and on-device deployment, with new weights published on Hugging Face in March 2026 and a formal announcement on May 29, 2026. The 2026 generation is built on a different architecture from the 2024 model and is described separately below. Unless otherwise noted, the technical details in the sections that follow refer to the original 2024 release.
The original model was first detailed in the technical report "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models," submitted to arXiv on April 18, 2024. Reka described Edge as a dense transformer built on the same backbone architecture used across the rest of the family, scaled down and trained on roughly 4.5 trillion tokens. On standard text benchmarks the model outperformed Meta's Llama 2 7B and Mistral AI's Mistral 7B across every evaluation reported in the original paper, and beat Google's Gemma 7B on most though not all of them.
Reka AI was founded in 2022 by researchers previously at Google DeepMind, Meta FAIR, and Google, and emerged from stealth in mid-2023 with a focus on building multimodal foundation models that could be deployed on-premises or in customer-controlled environments. The company's first publicly released product was Yasa-1, a multimodal assistant introduced in October 2023. The Core, Flash, and Edge family followed in April 2024 and represented the first time Reka described an internal model lineup spanning multiple size tiers.
The three models share the same training recipe and backbone, and differ primarily in parameter count and the volume of training compute applied. Reka Core is the flagship, sized at "frontier-class" scale and competitive with closed models from larger labs. Reka Flash is a 21-billion-parameter mid-tier model aimed at most general workloads. Reka Edge sits at the bottom of the lineup as a compact 7B option designed for deployments where latency or hardware footprint matters more than absolute capability. The technical report frames the family as a deliberate compute-class strategy rather than a single flagship product.
The 7B parameter class became a focal point of open-weight model releases in 2023 and 2024 because models of that size can run on a single consumer or workstation GPU, fit on a high-end laptop with quantization, and in some cases run on phones or edge devices. Reka explicitly framed Edge around that deployment envelope. Most 7B models at the time were text-only; Reka Edge was unusual in carrying full multimodal input support down to that size.
Reka Edge is a 7-billion-parameter dense transformer. Reka does not publish the exact layer count, hidden width, or head configuration in the public report, but it describes the backbone in terms of the design choices it shares with other modern decoder transformers.
The backbone is referred to internally as a "Noam" architecture, after a common reference design that combines several now-standard components:
| Component | Choice |
|---|---|
| Activation | SwiGLU |
| Attention | Grouped Query Attention (GQA) |
| Position encoding | Rotary Position Embeddings (RoPE) |
| Normalization | RMSNorm |
| Tokenizer | SentencePiece, 100K vocabulary, based on tiktoken |
This is broadly similar to PaLM but without the parallel attention and feed-forward layers used in some Google designs. The use of grouped query attention reduces the memory cost of the key-value cache, which matters more in small models intended for local inference where memory is often the binding constraint.
The model uses a modular encoder-decoder structure. Image, video, and audio inputs are processed by dedicated encoders whose outputs are projected into the language model's input space. The decoder is the language backbone described above. The same general approach is used across Core, Flash, and Edge, with differences mostly in encoder capacity and training mix.
Reka Edge ships with an 8K-token context window in its standard form, with a long-context variant trained out to 64K tokens. The 8K default is roughly in line with most 7B contemporaries from the same period.
The technical report states that Edge was trained on approximately 4.5 trillion language tokens after extensive deduplication and filtering. Training was done predominantly on Nvidia H100 GPUs using PyTorch, across several hundred chips over several weeks. The knowledge cutoff for the released checkpoint is November 2023. After pretraining the model went through instruction tuning followed by reinforcement learning from human feedback using proximal policy optimization (PPO).
Reka reported a fairly broad sweep of text and multimodal benchmarks for Edge in the original technical report. The text-only numbers are the cleanest comparison points because most peer 7B models at the time published the same evaluations.
| Benchmark | Reka Edge | Mistral 7B | Gemma 7B |
|---|---|---|---|
| MMLU | 65.7 | 62.5 | 64.3 |
| GSM8K | 66.2 | 35.4 | 46.4 |
| HumanEval | 54.3 | 26.2 | 32.3 |
| XQuAD | 54.2 | 29.7 | 21.7 |
| TydiQA | 61.5 | 31.7 | 35.8 |
| Belebele | 37.1 | 32.8 | 26.8 |
Reka reported that Edge beats both Mistral 7B and Llama 2 7B on all eight benchmarks evaluated in the paper, and beats Gemma 7B on every benchmark except MATH, where Gemma scored 24.3 against Edge's 23.2. The largest reported margins are on coding (HumanEval) and grade-school math word problems (GSM8K), where Edge's scores roughly double those of Mistral 7B in the same evaluation setup.
Multimodal evaluation in the report is reported partly as benchmark scores and partly as head-to-head chat comparisons. In the multimodal chat evaluation Edge scored an Elo of 986 with a 50.5% win rate, outperforming IDEFICS 80B and Adept Fuyu 8B by a large margin. The Reka Flash announcement post separately reported that Edge "outperforms LLaVA 1.6 7B" in multimodal chat and "approaches the performance of Gemini Pro," though those latter claims rest on Reka's own head-to-head methodology rather than a public benchmark.
On MT-Bench, a single-turn chat evaluation that uses a strong LLM judge, Edge scored 7.6, which Reka described as competitive with the best models of similar size at the time. In Reka's internal text-only chat evaluations the model ranked ahead of other 7B systems and was described as coming close to Anthropic's Claude Instant 1.2, a notably stronger comparison than the open-weight peers it shipped against.
Reka's framing of Edge centers on three deployment scenarios: latency-sensitive applications, on-premises deployments where data cannot leave a customer environment, and edge or device-side inference where the model needs to run with constrained hardware. The technical report explicitly describes Edge as "designed for local deployments and latency sensitive applications," which is consistent with how Reka markets the rest of its enterprise offering.
For on-device usage the 7B size is meaningful in two ways. First, at FP16 the raw weights fit in roughly 14GB of memory, which puts them within reach of a single workstation GPU and, with 4-bit quantization, within reach of many consumer GPUs and recent Apple Silicon laptops. Second, the use of grouped query attention reduces KV cache pressure during long-context generation, which matters more on memory-constrained devices than on a server.
The model is also explicitly multimodal at this size, which is the more distinctive part of the positioning. Most 7B-class models released in 2024 were text-only or had vision added as an afterthought through a separately tuned variant. Edge carries image, video, and audio understanding in its base form, which makes it usable for tasks like local document analysis, video summarization on a workstation, or voice-driven interfaces on hardware that cannot reach back to a server.
Access at launch was through the chat.reka.ai playground and the platform.reka.ai API. Reka has also described on-premises and virtual private cloud deployment options as part of its general enterprise offering, with Edge being the smallest model available through those channels.
In 2026 Reka reused the Reka Edge name for a redesigned 7-billion-parameter vision-language model built specifically for "physical AI" and on-device deployment. The weights were published on Hugging Face under the identifier reka-edge-2603 in March 2026, and Reka published a formal announcement, titled "Reka Edge: Frontier-Level Edge Intelligence for Physical AI," on May 29, 2026, with the accompanying write-up dated March 11, 2026.[6][7] Unlike the 2024 model, the 2026 generation is documented as accepting image and video inputs alongside text, with no mention of audio in its model card.[7]
The 2026 model pairs a 657M-parameter ConvNeXt V2 vision encoder with a 6.4-billion-parameter transformer language backbone that Reka states was trained from scratch for reasoning and generation, for roughly 7B parameters in total.[6] On Hugging Face the model uses a custom architecture class, Yasa2ForConditionalGeneration, and is distributed in BF16.[7] Reka's design emphasis is token efficiency: the model is built to emit only 64 tokens per image tile, which the company says lets it represent a 1024x1024 image in roughly three times fewer tokens than comparable models, and Reka reports throughput of about 5.46 images per second on its reference hardware, which it presents as roughly twice that of Cosmos Reason2 8B and Qwen3.5 9B.[6] Reka reports a time to first token of about 0.522 seconds.[6][7]
Reka also describes aggressive quantization. The company states that 4-bit quantization reduces the model's memory footprint from about 13GB to about 5GB while retaining over 98% of its original performance and raising throughput by up to roughly 2.3 times, and that a 3.5-bit option is available through its "Reka Quant" method.[6] These figures are Reka's own.
The 2026 Reka Edge is documented as supporting image understanding, video analysis, object detection with bounding boxes, spatial reasoning, and tool use, and as able to run fully offline.[6][7] Reka's model card publishes the following benchmark figures, comparing Reka Edge against two similarly sized models and against the larger Gemini 3 Pro evaluated through an API; all of these numbers are vendor-reported:[7]
| Benchmark | Reka Edge | Cosmos-Reason2 8B | Qwen 3.5 9B | Gemini 3 Pro |
|---|---|---|---|---|
| VQA-v2 | 88.40 | 79.82 | 83.22 | 89.78 |
| MLVU (video) | 74.30 | 37.85 | 52.39 | 80.68 |
| MMVU (multimodal video) | 71.68 | 51.52 | 68.64 | 78.88 |
| RefCOCO-A (detection) | 93.13 | 90.98 | 93.62 | 81.46 |
| RefCOCO-B (detection) | 86.70 | 85.74 | 88.83 | 82.85 |
| Mobile Actions (tool use) | 88.40 | 77.94 | 91.78 | 89.39 |
Reka positions the 2026 Edge for use cases such as robotics, drones, autonomous vehicles and automotive systems, cameras, augmented reality and wearables, and public safety, and lists target platforms including Nvidia Jetson, Apple Silicon, Linux, Windows, and Android on Snapdragon hardware.[6] Access is offered through the Reka playground, an API, the Hugging Face weights, and vLLM. The model is released under a Business Source License (BSL) 1.1, which Reka pairs with a free commercial-use grant for organizations under 1 million US dollars in annual revenue and requires a separate commercial license above that threshold.[6][7]
Reka Edge's most direct peers at release were the open-weight 7B-class language models popularized through 2023 and into 2024. By late 2024 and into 2025, that comparison set widened to include later compact models from Microsoft and Alibaba.
| Model | Parameters | Multimodal inputs | MMLU |
|---|---|---|---|
| Reka Edge | 7B | Text, image, video, audio | 65.7 |
| Mistral 7B | 7B | Text only | 62.5 |
| Gemma 7B | 7B | Text only | 64.3 |
| Llama 2 7B | 7B | Text only | ~46 |
The most notable structural difference here is the input modality. Edge is the only entry on this list that ingests video or audio natively at the 7B scale.
By the time later compact models reached the market the framing shifted slightly. Microsoft's Phi-3 mini, released in April 2024 close to Edge, is a 3.8B parameter model that competes on text benchmarks at lower compute. Reported MMLU for Phi-3 mini is 68.8, modestly ahead of Edge's 65.7, but Phi-3 mini is text-only in its base form and substantially smaller. Alibaba's Qwen 2.5 7B, released in September 2024, posted higher scores than Edge on most public text benchmarks but does not include audio or video as base modalities, and arrived more than a year after Edge's training cutoff.
The practical takeaway is that Reka Edge's pitch was never purely about scoring the highest MMLU at 7B. The pitch was that a model at this size could deliver competitive text performance and run multimodal inputs natively, in environments where larger models would not fit.
Reka Edge received modest but generally positive coverage on release, with most attention going to the larger Reka Core model that anchored the same announcement. Coverage in trade publications noted that a small startup had managed to publish a 7B model that outperformed Mistral 7B and Llama 2 7B on the standard set of evaluations, and that the model carried multimodal inputs natively rather than as a bolt-on. The Reka Flash announcement post highlighted Edge's MT-Bench score and its strong showing against larger multimodal baselines in the chat evaluation.
In the broader 7B model landscape Edge has remained a comparatively niche option. Open-weight checkpoints and developer mindshare have tended to cluster around Mistral, Meta's Llama line, Google's Gemma, and Alibaba's Qwen, all of which ship under permissive or open weights with large external developer communities. Reka has continued to operate primarily as a closed-API and enterprise-deployment company, which limits how broadly Edge gets used in third-party fine-tunes or research projects.
Reka has continued to update the family, releasing the redesigned 2026 vision-language model described above through Hugging Face under a source-available license. The architectural choices in the original release, dense 7B, modular multimodal encoders, GQA, RoPE, RMSNorm, have remained typical of small multimodal models released since.