The context window has been shattered: Subquadratic debuts a 12M token window

May 9, 2026 · ai coding · Source ↗

TLDR

Miami startup Subquadratic claims its SSA architecture hits 12M tokens with linear compute scaling, beating GPT-5.5 on MRCR v2 and Opus 4.6 on SWE-bench.

Subquadratic Selective Attention (SSA) scales linearly in compute and memory vs. quadratic cost in standard transformers; reported 52x faster than dense attention at 1M tokens.
MRCR v2 score of 83 beats GPT-5.5 (74.0%); needle-in-a-haystack at 12M tokens hits 92.1%; SWE-bench Verified at 82.4% edges Opus 4.6 (81.4%) and Gemini 3.1 Pro (80.6%).
Key caveat: each benchmark run was single-pass due to inference cost; SWE-bench margin is partly attributed to harness configuration, not model alone.
Shipping now: API with full 12M-token window and SubQ Code CLI agent; 50M-token window targeted for Q4 2026; no open weights.
Prior category attempts (Magic.dev 100M-token LTM, Mamba, Longformer) all traded retrieval quality or retained quadratic steps; SSA claims to avoid the indexer trap that makes DeepSeek NSA selection quadratic.

Commenters are broadly skeptical: no technical report published, no weights released, and the VC-backed structure makes independent verification unlikely near-term.
The thread points to an earlier HN discussion with primary source links, suggesting the article itself is a secondary write-up of a claim that has not yet been peer-reviewed or reproduced.

@refibrillator: flags a prior HN thread with primary sources and notes no technical report or code release given VC funding structure.