SubQ

AI Companies Large Language Models Model Architecture

7 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

5 citations

Revision

v1 · 1,415 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

SubQ is a large language model released on May 5, 2026 by Subquadratic, a Miami-based startup that emerged from stealth claiming to have built the first frontier model on a fully subquadratic attention architecture. The company says SubQ can read up to 12 million tokens in a single pass, a context window far larger than any production model from OpenAI, Anthropic, or Google, while costing a fraction as much to run on long inputs.^[1]^[2]^[5] Those claims are striking enough that several researchers have asked for independent verification before accepting them, since the company has not yet published a technical report or released the model weights.^[3]

The pitch rests on a single idea. Standard transformer attention scales quadratically with sequence length, so doubling the input quadruples the compute. Subquadratic says it redesigned attention to scale linearly instead, which is what makes a 12-million-token window economically plausible rather than merely possible. Whether the architecture actually holds accuracy at that length, and whether it does so without quietly reverting to dense attention under the hood, is the open question that the rest of the AI field is waiting to test.

Company and founders

Subquadratic is based in Miami and was founded by Justin Dangel and Alex Whedon.^[1]^[3] Dangel, the chief executive, is described as a five-time founder with prior companies in health tech, insurance technology, and consumer products.^[2]^[4] Whedon, the chief technology officer, was a software engineer at Meta and later head of generative AI at TribeAI, where the company says he led more than 40 enterprise AI deployments.^[2]^[4] The brief that circulated at launch described "ex-Meta GenAI leadership," which is roughly accurate but worth stating precisely: Whedon's generative-AI leadership title was at TribeAI, not at Meta, where he worked as an engineer.

The research team is small. Subquadratic says it employs 11 PhD researchers and research engineers with backgrounds at Meta, Google, Oxford, Cambridge, ByteDance, Adobe, and Microsoft.^[2]^[3] For a company making claims about a new class of frontier architecture, that headcount is modest, which is part of why the launch drew both excitement and suspicion.

The subquadratic architecture

The core technology is what the company calls Subquadratic Sparse Attention, abbreviated SSA. In a normal transformer, every token attends to every other token, an all-pairs comparison whose cost grows with the square of the sequence length. SSA replaces that with content-dependent token selection: for each query token, the model picks a small subset of earlier positions that actually matter, then computes exact attention only over that subset.^[2]^[3] Because the selected subset stays roughly fixed in size as the input grows, total compute and memory grow linearly with context length rather than quadratically.

The distinction from earlier sparse-attention work is that SSA chooses which tokens to attend to based on their content, not on a fixed geometric pattern such as a sliding window or a stride. Dangel summarized the goal plainly in an interview: "let's try to figure out how to not compare every token to every token to every token."^[1] He also framed the payoff in scaling terms. With quadratic laws, doubling the input requires four times the compute; with linear laws, it requires only twice.^[1]

This places SubQ in a lineage of attempts to escape the quadratic bottleneck, a lineage that includes Mamba, RWKV, DeepSeek Sparse Attention, and Kimi Linear. Reducing the cost of long context is one of the central problems in the field, because the quadratic term is what makes processing a whole codebase or a library of documents expensive. SubQ's specific claim is that it gets the efficiency of those linear approaches without the accuracy penalty that has historically come with them.

Context window and performance claims

SubQ ships in two forms. The production API exposes a 1-million-token context window under the name SubQ 1M-Preview, while a research model extends to the full 12 million tokens that anchored the launch coverage.^[2]^[4] Twelve million tokens is roughly nine million words, or on the order of 120 books, according to Subquadratic's own framing.^[1]

The company published a set of benchmark results to support its efficiency and accuracy claims. The figures below are reported by Subquadratic and have not yet been reproduced by outside labs.

Benchmark or metric	Reported result	Comparison
RULER at 128K (long-context retrieval)	95.6% accuracy	Claude Opus 4.6 at 94.8% ^[2]
RULER at 128K, cost	about $8 per run	about $2,600 for Claude Opus, roughly 300x cheaper ^[1]
Attention speed at 1M tokens	52x faster than FlashAttention	63% less compute ^[2]
Attention compute at 12M tokens	almost 1,000x less	versus other frontier models ^[2]
Needle-in-a-haystack retrieval at 12M tokens	92.1%	a length no frontier model currently reaches ^[3]
MRCR v2 (multi-round coreference)	83 (research model) / 65.9 (production)	beats OpenAI's score by about nine points ^[2]^[3]
SWE-Bench Verified (coding)	81.8	Opus 4.6 at 80.8, DeepSeek 4.0 Pro at 80.0 ^[2]

Alongside the API, Subquadratic launched SubQ Code, a command-line coding agent built to load an entire repository into one context window so a developer can plan, write, and review changes across a codebase without orchestrating multiple agents.^[1] A search product, SubQ Search, entered a free preview.^[4] The company has said the model will not be open-weight in the near term, though it can be fine-tuned for specific customer workloads.^[1]

Funding

Subquadratic raised a $29 million seed round, reported by SiliconANGLE at a roughly $500 million valuation.^[1] Named investors include Javier Villamizar, a former partner at the SoftBank Vision Fund, and Justin Mateen, a co-founder of Tinder and founder of the JAM Fund, along with Grant Gittlin and Jaclyn Rice Nelson.^[1]^[2] The company also lists early investors in Anthropic, OpenAI, Stripe, and Brex among its backers.^[3] A $500 million valuation on a $29 million seed is aggressive, and it signals that investors are pricing in the architecture working as advertised rather than waiting for proof.

Skeptical context

The reason SubQ has not been universally celebrated is that its strongest claims are exactly the ones that have proven hardest to deliver. Subquadratic architectures have a track record of looking excellent in theory and faltering in practice. Mamba, RWKV, DeepSeek Sparse Attention, and Kimi Linear all promised linear or near-linear scaling, and they repeatedly ran into the same wall: architectures that are cheap in the asymptotic sense often lose accuracy against dense attention at frontier scale, or they survive only by going hybrid, mixing subquadratic layers with ordinary attention layers and giving up the clean scaling story in the process.^[3] SubQ's claim to have avoided that tradeoff is extraordinary, and it has not been checked by anyone outside the company.

Two things are conspicuously missing. There is no full technical report, and there are no open weights, so no independent lab can rerun the benchmarks or inspect how token selection actually works.^[3] The company says a technical report, an architecture description, and third-party benchmarks are all on the way. They had not appeared at the time of launch.

History offers a cautionary parallel that critics raised immediately. In August 2024, Magic.dev announced a model with a 100-million-token context window and a claimed 1,000x efficiency advantage, and raised around $500 million on the strength of that story; as of early 2026 there was no public evidence of that model, LTM-2-mini, being used at scale outside Magic itself.^[3] Big context claims are easy to announce and hard to substantiate. The AI commentator Dan McAteer captured the split reaction in a widely shared post, writing that SubQ "is either the biggest breakthrough since the Transformer, or it's AI Theranos."^[3]

Both can look identical from the outside until someone runs the benchmarks independently. For now, SubQ is a genuinely interesting set of claims backed by a credible benchmark page and a small but pedigreed team, waiting on the evidence that would move it from interesting to proven.

References

Wheatley, Mike. "Subquadratic launches with $29M to bring 12M-token context windows to AI." SiliconANGLE, May 5, 2026. https://siliconangle.com/2026/05/05/subquadratic-launches-29m-bring-12m-token-context-windows-ai/ ↩
"Introducing SubQ: The First Fully Subquadratic LLM." Subquadratic. https://subq.ai/introducing-subq ↩
"Miami startup Subquadratic claims 1,000x AI efficiency gain with SubQ model; researchers demand independent proof." VentureBeat, May 6, 2026. https://venturebeat.com/technology/miami-startup-subquadratic-claims-1-000x-ai-efficiency-gain-with-subq-model-researchers-demand-independent-proof ↩
"Subquadratic: $29 Million Seed Raised For Long-Context AI Architecture." Pulse 2.0, May 2026. https://pulse2.com/subquadratic-29-million-seed-raised-for-long-context-ai-architecture/ ↩
"The context window has been shattered: Subquadratic debuts a 12-million-token window." The New Stack, May 2026. https://thenewstack.io/subquadratic-12-million-context-window/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

Jamba

Company and founders

The subquadratic architecture

Context window and performance claims

Funding

Skeptical context

References

Improve this article

Related Articles

Jamba

Rotary Position Embedding

Mamba

Jamba2

BitNet

Long-context language models

What links here

Related Articles

Jamba

Rotary Position Embedding

Mamba

Jamba2

BitNet

Long-context language models