Subquadratic: solution to LLM bottleneck?

TL;DR: Subquadratic presents SubQ, an LLM using sparse attention to avoid quadratic cost. Appen tests show 56x speed and 98% retrieval accuracy. Skepticism remains due to lack of public access and use of borrowed weights.

What Happened?

Subquadratic, a Miami-based startup, has come out of stealth mode to introduce SubQ, a language model that claims to solve the mathematical bottleneck that has limited LLMs for nearly a decade: the quadratic complexity of attention. Instead of comparing every word with every other, SubQ uses sparse attention, a simple idea that many have attempted without competitive success. The company initially released few details, drawing comparisons to Theranos. However, it has now shared results from an independent evaluation by Appen, a firm specializing in AI model testing. According to Appen, SubQ executed certain tasks 56 times faster than rival approaches and achieved 98% on a key long-document retrieval test.

The quadratic attention bottleneck has been a central problem since the 2017 publication of "Attention Is All You Need," which introduced the Transformer. Since then, most LLMs, including GPT-4, Gemini, and Claude, have used full attention, whose computational cost grows with the square of sequence length. This limits practical context to a few thousand tokens, unless costly techniques like external memory or sparse attention are used, which until now have failed to match full attention's performance on complex tasks. Subquadratic claims to have achieved this feat through a proprietary architecture combining sparse attention with a dynamic routing mechanism.

Appen's evaluation, from a company with over 25 years of experience in AI data and testing, included standard benchmarks like RULER (for long-document retrieval) and speed tests on reasoning and code generation tasks. On RULER, SubQ achieved 98% accuracy, surpassing models like GPT-4 (around 80% on long contexts) and Claude 3 (near 85%). In speed, SubQ processed 100,000 tokens in 0.8 seconds, compared to 45 seconds for a comparable full-attention model, a 56x improvement. However, Appen has not published the full report, and details on exact test conditions are limited.

Why Is This Important?

The quadratic attention bottleneck is one of the biggest obstacles to scaling LLMs. As input text grows, computational cost skyrockets, making models slow and expensive to operate. If SubQ truly solves this, it could democratize access to powerful language models, enabling applications that require processing large volumes of data, such as legal document analysis, code review, or searching extensive knowledge bases. For example, a law firm could process thousands of pages of contracts in seconds, or a developer could analyze entire code repositories in real time.

Additionally, energy savings would be significant. According to Subquadratic estimates, SubQ consumes up to 90% less energy than equivalent models, potentially reducing AI's carbon footprint. This is relevant in a context where training a single large model like GPT-4 emits approximately 300 tons of CO2, according to a University of Massachusetts study. If SubQ reduces inference costs, it could accelerate LLM adoption in industries where cost and efficiency are critical, such as healthcare, finance, and logistics.

Historically, previous sparse attention attempts, like OpenAI's Sparse Transformer (2019) or Allen AI's Longformer (2020), achieved speed improvements but could not match performance on complex tasks. Subquadratic claims to have overcome this limitation with a new sparsity algorithm that preserves contextual information. If confirmed, this would be a breakthrough comparable to the introduction of GPUs in neural network training in 2012.

What Consequences Will It Have?

If Subquadratic's claims are confirmed, we could see a new wave of innovation in language models. Competitors like OpenAI, Google, and Anthropic would have to rethink their architectures or risk falling behind. Companies like Microsoft, which integrates LLMs into Azure and Office, could benefit from lower inference costs. On the other hand, if SubQ turns out to be a mirage, the incident could increase skepticism toward AI startups and tighten verification standards, as happened after Theranos's collapse in 2018.

It is important to note that SubQ was built on weights from an open-source Chinese model (possibly Alibaba's Qwen), raising questions about its originality and third-party dependence. Moreover, the model is not publicly available, making independent replication difficult. Subquadratic has promised to release an open-source version in the coming months, but until then, the scientific community can only rely on provided data.

The market impact could be immediate. Stocks of companies relying on proprietary LLMs, like Alphabet (Google) or Anthropic, could be affected if SubQ proves superior. However, investors like Andreessen Horowitz, who backed Subquadratic with a $50 million round, are betting on success. If SubQ fails, it would be a hard blow to confidence in AI startups.

What Should Readers Know?

For now, Appen's results are promising but not conclusive. The scientific community awaits the chance to test SubQ itself. Subquadratic promises to release an open-source version soon. Until then, healthy skepticism is recommended, but without dismissing the possibility of a real breakthrough. As engineer Dan McAteer noted: "SubQ is either the biggest advance since the Transformer, or it's the Theranos of AI."

Readers should consider that Appen's evaluation, though independent, has not been published in a peer-reviewed journal. Additionally, Subquadratic has shared only selective results, which could hide weaknesses. For example, no metrics have been disclosed on general benchmarks like MMLU or HumanEval, where large models typically excel. Nor has the size of the SubQ model been specified, making fair comparisons difficult.

In summary, SubQ represents an exciting promise, but caution is necessary. AI history is full of advances that later failed to materialize, such as IBM Watson's generative language model in 2011 or Hinton's capsule networks in 2017. Until SubQ is available for independent testing, its true impact remains uncertain.

"SubQ is either the biggest advance since the Transformer, or it's the Theranos of AI," summarized engineer Dan McAteer.

Subquadratic: The End of the LLM Bottleneck?

What Happened?

Why Is This Important?

What Consequences Will It Have?

What Should Readers Know?

Keep reading