Why Data Efficiency, Not Compute, Is the Next AI Bottleneck
Published 2026-05-06 - Runtime about 9 min - Watch on YouTube
Ben and Asher Spector’s central claim is blunt: AI’s next bottleneck is data, not compute. They argue that the biggest wins so far came from unusually data-rich problems like search and coding, while most of the economy is far thinner on usable data. That shift changes who can build frontier systems.
What Matters
- Search and coding are exceptional because they sit on massive data reserves: the internet, plus synthetic code data.
- The hard test is the rest of the economy: robotics, trading, scientific discovery, and the long tail of tasks like supply chains.
- Compute scales more cleanly than data, because FLOPs get cheaper and GPUs are more homogeneous than real-world data sources.
- Collecting frontier-quality data means dealing with regulation, business terms, and fragmented sources instead of one centralized provider.
- Their bet: a 1,000x improvement in data efficiency would make model training and deployment 1,000x easier.
- They think current frameworks like PyTorch leave hardware capability on the table by hiding GPU-level primitives behind a single-threaded model.
- Flapping Airplanes is building systems that directly use GPUs more finely, because new systems can unlock new algorithms.
- The broader implication is competitive: if data stays the moat, only a few companies can train frontier models; data efficiency widens access.