SubQ
Last reviewed
Jun 3, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,415 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,415 words
Add missing citations, update stale details, or suggest a clearer explanation.
SubQ is a large language model released on May 5, 2026 by Subquadratic, a Miami-based startup that emerged from stealth claiming to have built the first frontier model on a fully subquadratic attention architecture. The company says SubQ can read up to 12 million tokens in a single pass, a context window far larger than any production model from OpenAI, Anthropic, or Google, while costing a fraction as much to run on long inputs.[1][2][5] Those claims are striking enough that several researchers have asked for independent verification before accepting them, since the company has not yet published a technical report or released the model weights.[3]
The pitch rests on a single idea. Standard transformer attention scales quadratically with sequence length, so doubling the input quadruples the compute. Subquadratic says it redesigned attention to scale linearly instead, which is what makes a 12-million-token window economically plausible rather than merely possible. Whether the architecture actually holds accuracy at that length, and whether it does so without quietly reverting to dense attention under the hood, is the open question that the rest of the AI field is waiting to test.
Subquadratic is based in Miami and was founded by Justin Dangel and Alex Whedon.[1][3] Dangel, the chief executive, is described as a five-time founder with prior companies in health tech, insurance technology, and consumer products.[2][4] Whedon, the chief technology officer, was a software engineer at Meta and later head of generative AI at TribeAI, where the company says he led more than 40 enterprise AI deployments.[2][4] The brief that circulated at launch described "ex-Meta GenAI leadership," which is roughly accurate but worth stating precisely: Whedon's generative-AI leadership title was at TribeAI, not at Meta, where he worked as an engineer.
The research team is small. Subquadratic says it employs 11 PhD researchers and research engineers with backgrounds at Meta, Google, Oxford, Cambridge, ByteDance, Adobe, and Microsoft.[2][3] For a company making claims about a new class of frontier architecture, that headcount is modest, which is part of why the launch drew both excitement and suspicion.
The core technology is what the company calls Subquadratic Sparse Attention, abbreviated SSA. In a normal transformer, every token attends to every other token, an all-pairs comparison whose cost grows with the square of the sequence length. SSA replaces that with content-dependent token selection: for each query token, the model picks a small subset of earlier positions that actually matter, then computes exact attention only over that subset.[2][3] Because the selected subset stays roughly fixed in size as the input grows, total compute and memory grow linearly with context length rather than quadratically.
The distinction from earlier sparse-attention work is that SSA chooses which tokens to attend to based on their content, not on a fixed geometric pattern such as a sliding window or a stride. Dangel summarized the goal plainly in an interview: "let's try to figure out how to not compare every token to every token to every token."[1] He also framed the payoff in scaling terms. With quadratic laws, doubling the input requires four times the compute; with linear laws, it requires only twice.[1]
This places SubQ in a lineage of attempts to escape the quadratic bottleneck, a lineage that includes Mamba, RWKV, DeepSeek Sparse Attention, and Kimi Linear. Reducing the cost of long context is one of the central problems in the field, because the quadratic term is what makes processing a whole codebase or a library of documents expensive. SubQ's specific claim is that it gets the efficiency of those linear approaches without the accuracy penalty that has historically come with them.
SubQ ships in two forms. The production API exposes a 1-million-token context window under the name SubQ 1M-Preview, while a research model extends to the full 12 million tokens that anchored the launch coverage.[2][4] Twelve million tokens is roughly nine million words, or on the order of 120 books, according to Subquadratic's own framing.[1]
The company published a set of benchmark results to support its efficiency and accuracy claims. The figures below are reported by Subquadratic and have not yet been reproduced by outside labs.
| Benchmark or metric | Reported result | Comparison |
|---|---|---|
| RULER at 128K (long-context retrieval) | 95.6% accuracy | Claude Opus 4.6 at 94.8% [2] |
| RULER at 128K, cost | about $8 per run | about $2,600 for Claude Opus, roughly 300x cheaper [1] |
| Attention speed at 1M tokens | 52x faster than FlashAttention | 63% less compute [2] |
| Attention compute at 12M tokens | almost 1,000x less | versus other frontier models [2] |
| Needle-in-a-haystack retrieval at 12M tokens | 92.1% | a length no frontier model currently reaches [3] |
| MRCR v2 (multi-round coreference) | 83 (research model) / 65.9 (production) | beats OpenAI's score by about nine points [2][3] |
| SWE-Bench Verified (coding) | 81.8 | Opus 4.6 at 80.8, DeepSeek 4.0 Pro at 80.0 [2] |
Alongside the API, Subquadratic launched SubQ Code, a command-line coding agent built to load an entire repository into one context window so a developer can plan, write, and review changes across a codebase without orchestrating multiple agents.[1] A search product, SubQ Search, entered a free preview.[4] The company has said the model will not be open-weight in the near term, though it can be fine-tuned for specific customer workloads.[1]
Subquadratic raised a $29 million seed round, reported by SiliconANGLE at a roughly $500 million valuation.[1] Named investors include Javier Villamizar, a former partner at the SoftBank Vision Fund, and Justin Mateen, a co-founder of Tinder and founder of the JAM Fund, along with Grant Gittlin and Jaclyn Rice Nelson.[1][2] The company also lists early investors in Anthropic, OpenAI, Stripe, and Brex among its backers.[3] A $500 million valuation on a $29 million seed is aggressive, and it signals that investors are pricing in the architecture working as advertised rather than waiting for proof.
The reason SubQ has not been universally celebrated is that its strongest claims are exactly the ones that have proven hardest to deliver. Subquadratic architectures have a track record of looking excellent in theory and faltering in practice. Mamba, RWKV, DeepSeek Sparse Attention, and Kimi Linear all promised linear or near-linear scaling, and they repeatedly ran into the same wall: architectures that are cheap in the asymptotic sense often lose accuracy against dense attention at frontier scale, or they survive only by going hybrid, mixing subquadratic layers with ordinary attention layers and giving up the clean scaling story in the process.[3] SubQ's claim to have avoided that tradeoff is extraordinary, and it has not been checked by anyone outside the company.
Two things are conspicuously missing. There is no full technical report, and there are no open weights, so no independent lab can rerun the benchmarks or inspect how token selection actually works.[3] The company says a technical report, an architecture description, and third-party benchmarks are all on the way. They had not appeared at the time of launch.
History offers a cautionary parallel that critics raised immediately. In August 2024, Magic.dev announced a model with a 100-million-token context window and a claimed 1,000x efficiency advantage, and raised around $500 million on the strength of that story; as of early 2026 there was no public evidence of that model, LTM-2-mini, being used at scale outside Magic itself.[3] Big context claims are easy to announce and hard to substantiate. The AI commentator Dan McAteer captured the split reaction in a widely shared post, writing that SubQ "is either the biggest breakthrough since the Transformer, or it's AI Theranos."[3]
Both can look identical from the outside until someone runs the benchmarks independently. For now, SubQ is a genuinely interesting set of claims backed by a credible benchmark page and a small but pedigreed team, waiting on the evidence that would move it from interesting to proven.