Reka Core
Last reviewed
May 16, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 ยท 2,960 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
13 citations
Review status
Source-backed
Revision
v1 ยท 2,960 words
Add missing citations, update stale details, or suggest a clearer explanation.
Reka Core is a frontier class multimodal foundation model developed by Reka AI, a research and product company founded in 2022 by former scientists from DeepMind, Google Brain, Meta FAIR, and Baidu. The model was announced on April 15, 2024 and is the largest of three models released together as the Reka Core, Flash, and Edge family. Core processes text, images, video, and audio as native inputs, returns text outputs, and was positioned at launch as a competitor to GPT-4, Claude 3 Opus, and Gemini Ultra on a range of vision and language tasks.
Reka Core is delivered as a closed weights API. It is not available as a downloadable checkpoint, and its parameter count has not been published. The supporting technical report, titled Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models, was posted to arXiv on April 18, 2024 (arXiv:2404.12387) and lists Core, the 21 billion parameter Reka Flash, and the 7 billion parameter Reka Edge as a single product family trained on shared infrastructure. The release attracted attention in part because the company stated it had built the model with roughly twenty technical staff and a fraction of the compute consumed by the largest U.S. labs, a contrast Reka leadership repeated in interviews after launch.
Reka was founded in mid 2022 by Dani Yogatama, Yi Tay, Qi Liu, and Cyprien de Masson d'Autume. Yogatama, the chief executive, was previously a senior staff research scientist at DeepMind, where he worked on language models and continual learning. Yi Tay joined as chief scientist after a stint at Google Brain, where he was one of the modeling and architecture co leads on PaLM 2 and a contributor to UL2 and the T5 line of research. Cyprien de Masson d'Autume, the chief technology officer, came from DeepMind and Google, with prior work on language modeling and AlphaCode. Qi Liu came from a research role at Baidu and previously held an academic position at the University of Hong Kong. All four are AI researchers by background, and the company's emphasis on multimodal training, evaluation, and scaling laws reflects that history.
The company emerged from stealth in June 2023 with a $58 million round led by DST Global Partners and Radical Ventures, with participation from Snowflake Ventures and angel investors including former GitHub chief executive Nat Friedman. At the time, the company was building a hosted model stack for enterprise customers and offering custom training for specific domains, rather than competing for the consumer chat market. Reka shipped a research preview of Reka Flash in February 2024, followed by Reka Edge shortly after, and then Core in April 2024 as the flagship of the family.
The context for the Core launch matters. April 2024 was a crowded month for frontier model releases. Anthropic's Claude 3 Opus had landed in early March, OpenAI's GPT-4 family was still the reference point for most benchmark comparisons, and Google's Gemini Ultra had launched in February. Reka positioned Core as a fourth credible option in that tier, with an emphasis on three things: native multimodal handling across text, image, video, and audio; competitive scores on standard reasoning and coding benchmarks; and an API plus enterprise deployment story rather than a free consumer product.
The Reka technical report describes a single architecture family used for Core, Flash, and Edge. The text backbone is a transformer decoder using SwiGLU activations, Grouped Query Attention, Rotary Positional Embeddings (RoPE), and RMSNorm, which is the same combination popularized by LLaMA 2 and adopted by many subsequent open and closed models. On top of the text decoder, Reka adds modality specific encoders for images, video, and audio, with cross attention pathways that route encoded features into the language model. The report calls this a modular encoder decoder configuration.
Reka has not published Core's parameter count. The two smaller siblings are explicit: Reka Flash is a 21 billion parameter dense model, and Reka Edge is a 7 billion parameter dense model. Both are trained on internal data mixes and use the same modality stack. The report notes that at the time of writing, Reka Core had not finished training, which means the reported Core numbers represent a snapshot rather than a final checkpoint. Subsequent updates to the hosted Core endpoint have not been accompanied by new public benchmark cards.
Training was carried out on a heterogeneous cluster. The report describes peak compute of roughly 2,500 Nvidia H100 GPUs and 2,500 Nvidia A100 GPUs, with the majority of large scale runs landing on the H100s. Reka Flash was trained on approximately 5 trillion tokens, and Reka Edge on approximately 4.5 trillion. Pretraining covered 32 languages that were explicitly upweighted, with broader exposure to roughly 110 languages from sources such as multilingual Wikipedia. Post training combined supervised fine tuning on instruction data, preference modeling, and a reinforcement learning from human feedback stage on a proprietary dataset.
Reka Core was the company's flagship demonstration that one model could accept text, image, video, and audio inputs through the same API and respond with grounded text. The intent was not to introduce four separate endpoints but to let a developer hand the model a PDF, a chart, a clip of a meeting, or a podcast and get answers that reference all of those inputs in one conversation.
| Modality | Inputs supported | Typical tasks at launch |
|---|---|---|
| Text | Up to 128K tokens of context | Question answering, summarization, code generation, multilingual chat |
| Image | Single images, multiple images per prompt | Visual question answering, document understanding, OCR, chart reading |
| Video | Short clips with synchronized audio | Action recognition, narration, perception style reasoning |
| Audio | Speech, ambient sound, music | Transcription, audio captioning, audio question answering |
Reka described Core at launch as one of two commercially available models that handled all four input modalities through a single endpoint, with Gemini being the other in the same period. GPT-4 supported image and audio inputs through separate channels at the time, and most other API providers were still text plus image only. The 128K token context window applied to text and to multimodal prompts, with audio and video counted against the token budget after encoding. Outputs were limited to text, with no image or audio generation.
Multilingual coverage was a deliberate focus. The model is described as fluent in English plus several Asian and European languages, and the company highlighted Indonesian, Korean, Japanese, and a number of European languages in its public material. The pretraining mix's emphasis on 32 upweighted languages, combined with two of the four founders having East and Southeast Asian research backgrounds, contributed to comparatively strong non English benchmark performance at the model's parameter scale.
The Reka Core v0.5 numbers from the company's technical report are summarized below, with the matched scores from the report's comparison tables for GPT-4, Claude 3 Opus, and Gemini Pro 1.5 included where the report reports them. Benchmarks marked with an asterisk were measured by Reka under their own evaluation protocol.
| Benchmark | Reka Core | GPT-4 | Claude 3 Opus | Gemini Pro 1.5 | Reka Flash | Reka Edge |
|---|---|---|---|---|---|---|
| MMLU (knowledge) | 83.2 | 86.4 | 86.8 | 81.9 | 75.9 | 65.7 |
| GSM8K (math word problems) | 92.2 | 92.0 | 95.0 | 91.7 | 85.8 | 66.2 |
| HumanEval (Python code) | 76.8 | 67.0 | 84.9 | 71.9 | 72.0 | 54.3 |
| GPQA (graduate science) | 38.2 | 35.7 | 50.4 | 41.5 | not reported | not reported |
| MMMU (multimodal college level) | 56.3 | 56.8 | 59.4 | 58.5 | 53.3 | not reported |
| VQAv2 (visual question answering) | 78.1 | 77.2 | 50.5 | 73.2 | 78.4 | not reported |
| Perception Test (video) | 59.3 | not reported | not reported | 51.1 | 56.4 | not reported |
A few observations from the table. On MMLU, the standard knowledge probe, Core lands a few points below GPT-4 and Claude 3 Opus and above Gemini Pro 1.5 in the report's measurements. On GSM8K, the gap to GPT-4 is essentially zero, with Claude 3 Opus reported as the leader. On HumanEval, Core's 76.8 sits above Reka's own measurement of GPT-4 at 67.0 but below Claude 3 Opus at 84.9, which suggests the model was specifically tuned on code data without reaching the Anthropic ceiling at the time.
On the multimodal side, the more interesting numbers are MMMU and the Perception Test. MMMU is a college level question set covering visual reasoning across disciplines. Core's 56.3 sits within a couple of points of GPT-4V and Gemini Pro 1.5 in the report's measurements, while Claude 3 Opus is reported slightly ahead. On the Perception Test, a video question answering benchmark from DeepMind, Core's 59.3 outperforms the report's measurement of Gemini Pro 1.5 at 51.1, which Reka highlighted in its launch material.
The company also commissioned an independent third party blind human evaluation comparing multimodal chat outputs against several frontier models. Core finished as the second most preferred model in that evaluation, ahead of Claude 3 Opus and behind only the strongest reference model in the set. The exact preference percentages and the identity of the leader were not published in the report.
Reka Core launched as an API only product. The primary access channels are Reka's hosted API, Reka's own chat interface at chat.reka.ai, the Yasa platform that Reka offers for enterprise deployment, and partner clouds. Snowflake Cortex and Oracle Cloud Infrastructure offered Core through their own surfaces from launch, and AI Singapore was named as an early sovereign partner.
Launch pricing for the Core API was 10 dollars per million input tokens and 25 dollars per million output tokens. That placed Core roughly in line with GPT-4 Turbo's input pricing at the time and slightly below Claude 3 Opus on input, while output was priced about 17 percent below Claude 3 Opus's 75 dollar output rate. The smaller siblings were cheaper: Reka Flash launched at 0.80 dollars per million input tokens and 2.00 dollars per million output tokens, and Reka Edge at 0.40 dollars per million input tokens and 1.00 dollar per million output tokens.
| Tier | Input ($ per 1M tokens) | Output ($ per 1M tokens) |
|---|---|---|
| Reka Core | 10.00 | 25.00 |
| Reka Flash | 0.80 | 2.00 |
| Reka Edge | 0.40 | 1.00 |
The API exposes a single multimodal chat endpoint that accepts text, image URLs or uploads, audio uploads, and video uploads. Images and audio are encoded by the modality specific stack before being routed through the language model, which is why prompt sizes are reported in tokens after encoding rather than raw bytes. Function calling, JSON mode, and a research grade tool use mode were added in updates after the launch.
For customers that need data residency or on premises deployment, Reka offers Yasa, a packaged deployment that runs Core and the smaller models on customer controlled infrastructure. Government and regulated industry deployments were positioned as the primary use case for that path, and the Carahsoft partnership announced in 2024 routed Reka into U.S. federal procurement channels.
In the months after the Core launch, Reka was reported to be in acquisition talks with Snowflake at a valuation in the one billion dollar range. Those talks were reported by Bloomberg and others in May 2024 and were eventually called off. Reka and Snowflake confirmed in interviews that both sides concluded the company was better off remaining independent, and the partnership continued through Snowflake Cortex.
Reka raised a follow on round of 110 million dollars in July 2025, led by Nvidia and Snowflake, at a valuation reported around one billion dollars. The round confirmed unicorn status for the company within three years of founding and was framed as growth funding to scale enterprise deployments rather than a pivot to consumer products. As of early 2026, Reka reports a headcount of roughly 60 to 65 people, the same approximate scale at which Core was originally built.
The company has continued to ship updated checkpoints under the same brand names. Reka Flash 3 and Reka Flash 3.1 were released in 2025, the latter as an open weights checkpoint on Hugging Face under a permissive license. Core itself has remained a closed weights model accessible only through the API, with the company citing its frontier capabilities and the cost of training as reasons for that choice.
| Model | Vendor | Released | Modalities accepted | Open or closed | Context | Headline benchmark in Reka report |
|---|---|---|---|---|---|---|
| Reka Core | Reka AI | April 2024 | Text, image, video, audio | Closed (API) | 128K | MMLU 83.2, GSM8K 92.2, MMMU 56.3 |
| GPT-4 (gpt-4-1106) | OpenAI | November 2023 | Text, image | Closed (API) | 128K | MMLU 86.4, GSM8K 92.0, MMMU 56.8 |
| Claude 3 Opus | Anthropic | March 2024 | Text, image | Closed (API) | 200K | MMLU 86.8, GSM8K 95.0, MMMU 59.4 |
| Gemini Pro 1.5 | February 2024 | Text, image, video, audio | Closed (API) | 1M | MMLU 81.9, GSM8K 91.7, MMMU 58.5 |
Reka Core's position in the April 2024 frontier tier was distinctive in two respects. First, it accepted the same set of four input modalities as Gemini Pro 1.5, which at the time was unusual at the API level. Second, it came from a team of roughly twenty technical staff working with a compute budget far smaller than the major U.S. labs, which made the head to head benchmark numbers a talking point in the broader question of how concentrated frontier capability really is.
The model lagged Claude 3 Opus on most pure reasoning and coding benchmarks reported in the technical report and trailed GPT-4 on MMLU by a few points. It led the report's measurement of Gemini Pro 1.5 on most language benchmarks and outperformed the comparison models on the Perception Test for video understanding. The independent blind human evaluation finding, with Core ranked second in multimodal chat preference, was the cleanest external validation the company had at launch.
Coverage of Reka Core in April 2024 was largely framed around two angles. The first was the team size and compute story. Several outlets, including VentureBeat and TechCrunch, picked up Reka's claim that the company had built Core with about twenty researchers and a relatively small cluster, and contrasted that with the headcounts and GPU footprints of OpenAI, Anthropic, and Google. The framing leaned on the observation that scaling laws still admit competitive entries from focused, well staffed teams without billion dollar training runs.
The second angle was the multimodal angle. Reviewers paid more attention to Core's ability to take audio and video inputs through a single endpoint than to any single benchmark number. The Perception Test result and the third party blind preference evaluation were the most quoted data points. Practitioners noted that the model was strongest on visual question answering and document understanding, and weaker on long form code generation compared to Claude 3 Opus.
Developer reaction was more measured. The API was easy to use and pricing was competitive, but the ecosystem around Core, including libraries, integrations, and community fine tunes, was smaller than the ones built around OpenAI and Anthropic APIs. Several engineers reported using Core specifically for tasks involving audio and video inputs where alternatives were not yet available, and falling back to other APIs for pure text reasoning. The 128K context window was adequate for most uses but trailed Gemini Pro 1.5's one million token context and Claude 3's 200K window.
Within the AI research community, the technical report was received as a useful contribution on multimodal training recipes from a smaller lab. Citations of the report focus on the encoder configurations, the training data composition, and the multilingual coverage decisions rather than on the headline benchmark numbers. The model itself, being closed weights, has been less studied directly than the open siblings that followed.