Reka Edge
Last reviewed
May 16, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,931 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,931 words
Add missing citations, update stale details, or suggest a clearer explanation.
Reka Edge is a 7-billion-parameter multimodal language model developed by Reka AI, introduced in April 2024 as the smallest member of the company's first publicly described model family. The model was released alongside two larger siblings, Reka Core and Reka Flash, and was positioned as the option intended for local deployment and latency-sensitive applications. Unlike most language models in its size class at the time, Reka Edge accepts image, video, and audio inputs in addition to text, although it produces text-only outputs.
The model was first detailed in the technical report "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models," submitted to arXiv on April 18, 2024. Reka described Edge as a dense transformer built on the same backbone architecture used across the rest of the family, scaled down and trained on roughly 4.5 trillion tokens. On standard text benchmarks the model outperformed Meta's Llama 2 7B and Mistral AI's Mistral 7B across every evaluation reported in the original paper, and beat Google's Gemma 7B on most though not all of them.
Reka AI was founded in 2022 by researchers previously at Google DeepMind, Meta FAIR, and Google, and emerged from stealth in mid-2023 with a focus on building multimodal foundation models that could be deployed on-premises or in customer-controlled environments. The company's first publicly released product was Yasa-1, a multimodal assistant introduced in October 2023. The Core, Flash, and Edge family followed in April 2024 and represented the first time Reka described an internal model lineup spanning multiple size tiers.
The three models share the same training recipe and backbone, and differ primarily in parameter count and the volume of training compute applied. Reka Core is the flagship, sized at "frontier-class" scale and competitive with closed models from larger labs. Reka Flash is a 21-billion-parameter mid-tier model aimed at most general workloads. Reka Edge sits at the bottom of the lineup as a compact 7B option designed for deployments where latency or hardware footprint matters more than absolute capability. The technical report frames the family as a deliberate compute-class strategy rather than a single flagship product.
The 7B parameter class became a focal point of open-weight model releases in 2023 and 2024 because models of that size can run on a single consumer or workstation GPU, fit on a high-end laptop with quantization, and in some cases run on phones or edge devices. Reka explicitly framed Edge around that deployment envelope. Most 7B models at the time were text-only; Reka Edge was unusual in carrying full multimodal input support down to that size.
Reka Edge is a 7-billion-parameter dense transformer. Reka does not publish the exact layer count, hidden width, or head configuration in the public report, but it describes the backbone in terms of the design choices it shares with other modern decoder transformers.
The backbone is referred to internally as a "Noam" architecture, after a common reference design that combines several now-standard components:
| Component | Choice |
|---|---|
| Activation | SwiGLU |
| Attention | Grouped Query Attention (GQA) |
| Position encoding | Rotary Position Embeddings (RoPE) |
| Normalization | RMSNorm |
| Tokenizer | SentencePiece, 100K vocabulary, based on tiktoken |
This is broadly similar to PaLM but without the parallel attention and feed-forward layers used in some Google designs. The use of grouped query attention reduces the memory cost of the key-value cache, which matters more in small models intended for local inference where memory is often the binding constraint.
The model uses a modular encoder-decoder structure. Image, video, and audio inputs are processed by dedicated encoders whose outputs are projected into the language model's input space. The decoder is the language backbone described above. The same general approach is used across Core, Flash, and Edge, with differences mostly in encoder capacity and training mix.
Reka Edge ships with an 8K-token context window in its standard form, with a long-context variant trained out to 64K tokens. The 8K default is roughly in line with most 7B contemporaries from the same period.
The technical report states that Edge was trained on approximately 4.5 trillion language tokens after extensive deduplication and filtering. Training was done predominantly on Nvidia H100 GPUs using PyTorch, across several hundred chips over several weeks. The knowledge cutoff for the released checkpoint is November 2023. After pretraining the model went through instruction tuning followed by reinforcement learning from human feedback using proximal policy optimization (PPO).
Reka reported a fairly broad sweep of text and multimodal benchmarks for Edge in the original technical report. The text-only numbers are the cleanest comparison points because most peer 7B models at the time published the same evaluations.
| Benchmark | Reka Edge | Mistral 7B | Gemma 7B |
|---|---|---|---|
| MMLU | 65.7 | 62.5 | 64.3 |
| GSM8K | 66.2 | 35.4 | 46.4 |
| HumanEval | 54.3 | 26.2 | 32.3 |
| XQuAD | 54.2 | 29.7 | 21.7 |
| TydiQA | 61.5 | 31.7 | 35.8 |
| Belebele | 37.1 | 32.8 | 26.8 |
Reka reported that Edge beats both Mistral 7B and Llama 2 7B on all eight benchmarks evaluated in the paper, and beats Gemma 7B on every benchmark except MATH, where Gemma scored 24.3 against Edge's 23.2. The largest reported margins are on coding (HumanEval) and grade-school math word problems (GSM8K), where Edge's scores roughly double those of Mistral 7B in the same evaluation setup.
Multimodal evaluation in the report is reported partly as benchmark scores and partly as head-to-head chat comparisons. In the multimodal chat evaluation Edge scored an Elo of 986 with a 50.5% win rate, outperforming IDEFICS 80B and Adept Fuyu 8B by a large margin. The Reka Flash announcement post separately reported that Edge "outperforms LLaVA 1.6 7B" in multimodal chat and "approaches the performance of Gemini Pro," though those latter claims rest on Reka's own head-to-head methodology rather than a public benchmark.
On MT-Bench, a single-turn chat evaluation that uses a strong LLM judge, Edge scored 7.6, which Reka described as competitive with the best models of similar size at the time. In Reka's internal text-only chat evaluations the model ranked ahead of other 7B systems and was described as coming close to Anthropic's Claude Instant 1.2, a notably stronger comparison than the open-weight peers it shipped against.
Reka's framing of Edge centers on three deployment scenarios: latency-sensitive applications, on-premises deployments where data cannot leave a customer environment, and edge or device-side inference where the model needs to run with constrained hardware. The technical report explicitly describes Edge as "designed for local deployments and latency sensitive applications," which is consistent with how Reka markets the rest of its enterprise offering.
For on-device usage the 7B size is meaningful in two ways. First, at FP16 the raw weights fit in roughly 14GB of memory, which puts them within reach of a single workstation GPU and, with 4-bit quantization, within reach of many consumer GPUs and recent Apple Silicon laptops. Second, the use of grouped query attention reduces KV cache pressure during long-context generation, which matters more on memory-constrained devices than on a server.
The model is also explicitly multimodal at this size, which is the more distinctive part of the positioning. Most 7B-class models released in 2024 were text-only or had vision added as an afterthought through a separately tuned variant. Edge carries image, video, and audio understanding in its base form, which makes it usable for tasks like local document analysis, video summarization on a workstation, or voice-driven interfaces on hardware that cannot reach back to a server.
Access at launch was through the chat.reka.ai playground and the platform.reka.ai API. Reka has also described on-premises and virtual private cloud deployment options as part of its general enterprise offering, with Edge being the smallest model available through those channels.
Reka Edge's most direct peers at release were the open-weight 7B-class language models popularized through 2023 and into 2024. By late 2024 and into 2025, that comparison set widened to include later compact models from Microsoft and Alibaba.
| Model | Parameters | Multimodal inputs | MMLU |
|---|---|---|---|
| Reka Edge | 7B | Text, image, video, audio | 65.7 |
| Mistral 7B | 7B | Text only | 62.5 |
| Gemma 7B | 7B | Text only | 64.3 |
| Llama 2 7B | 7B | Text only | ~46 |
The most notable structural difference here is the input modality. Edge is the only entry on this list that ingests video or audio natively at the 7B scale.
By the time later compact models reached the market the framing shifted slightly. Microsoft's Phi-3 mini, released in April 2024 close to Edge, is a 3.8B parameter model that competes on text benchmarks at lower compute. Reported MMLU for Phi-3 mini is 68.8, modestly ahead of Edge's 65.7, but Phi-3 mini is text-only in its base form and substantially smaller. Alibaba's Qwen 2.5 7B, released in September 2024, posted higher scores than Edge on most public text benchmarks but does not include audio or video as base modalities, and arrived more than a year after Edge's training cutoff.
The practical takeaway is that Reka Edge's pitch was never purely about scoring the highest MMLU at 7B. The pitch was that a model at this size could deliver competitive text performance and run multimodal inputs natively, in environments where larger models would not fit.
Reka Edge received modest but generally positive coverage on release, with most attention going to the larger Reka Core model that anchored the same announcement. Coverage in trade publications noted that a small startup had managed to publish a 7B model that outperformed Mistral 7B and Llama 2 7B on the standard set of evaluations, and that the model carried multimodal inputs natively rather than as a bolt-on. The Reka Flash announcement post highlighted Edge's MT-Bench score and its strong showing against larger multimodal baselines in the chat evaluation.
In the broader 7B model landscape Edge has remained a comparatively niche option. Open-weight checkpoints and developer mindshare have tended to cluster around Mistral, Meta's Llama line, Google's Gemma, and Alibaba's Qwen, all of which ship under permissive or open weights with large external developer communities. Reka has continued to operate as a closed-API and enterprise-deployment company, which limits how broadly Edge gets used in third-party fine-tunes or research projects.
Reka has continued to update the family, with newer checkpoints of Edge made available through Hugging Face. The architectural choices in the original release, dense 7B, modular multimodal encoders, GQA, RoPE, RMSNorm, have remained typical of small multimodal models released since.