Why Data Efficiency, Not Compute, Is the Next AI Bottleneck

May 6, 2026 · ai · Source ↗

Published 2026-05-06 - Runtime about 9 min - Watch on YouTube

Ben and Asher Spector’s central claim is blunt: AI’s next bottleneck is data, not compute. They argue that the biggest wins so far came from unusually data-rich problems like search and coding, while most of the economy is far thinner on usable data. That shift changes who can build frontier systems.

What Matters

Search and coding are exceptional because they sit on massive data reserves: the internet, plus synthetic code data.
The hard test is the rest of the economy: robotics, trading, scientific discovery, and the long tail of tasks like supply chains.
Compute scales more cleanly than data, because FLOPs get cheaper and GPUs are more homogeneous than real-world data sources.
Collecting frontier-quality data means dealing with regulation, business terms, and fragmented sources instead of one centralized provider.
Their bet: a 1,000x improvement in data efficiency would make model training and deployment 1,000x easier.
They think current frameworks like PyTorch leave hardware capability on the table by hiding GPU-level primitives behind a single-threaded model.
Flapping Airplanes is building systems that directly use GPUs more finely, because new systems can unlock new algorithms.
The broader implication is competitive: if data stays the moat, only a few companies can train frontier models; data efficiency widens access.