Why We Need Continual Learning

Apr 22, 2026 · ai · Source ↗

TLDR

LLMs freeze at training; a16z argues models must compress new experience into weights post-deployment, not just retrieve it from external storage.

Three continual learning approaches exist on a spectrum: context (RAG, agents), modules (adapter layers, compressed KV caches), and full weight updates, each with distinct capability and risk profiles.
An 8B model plus a knowledge module can match a 1B-parameter model on targeted tasks, making partial compaction a composable near-term bet.
Naive weight updates fail for six engineering reasons including catastrophic forgetting, temporal disentanglement, and logical integration failure, plus four governance problems including alignment degradation and auditability collapse.
Medical imaging artifacts, audio cadence, and other high-dimensional tacit knowledge cannot be expressed in text and can only be encoded in weights, not context windows.
Research directions with named work: EWC (Kirkpatrick 2017), TTT-Discover (Sun 2020+), MAML, LoRD (Liu 2025), STaR (Zelikman 2022), AlphaEvolve (DeepMind 2025).

The startup opportunity the authors identify is converting user corrections and task outcomes into stable weight updates via RL feedback loops, not bigger RAG pipelines.
Ilya Sutskever is quoted predicting deployment will involve continual learning and trial-and-error, signaling frontier lab direction beyond static weights.
State space models are framed as extending agentic coherence from ~20 steps to ~20,000, a prerequisite for long-running autonomous agents before parametric learning matures.

Malika Aubakirova and Matt Bornstein, Andreessen Horowitz · 2026-04-22 · Read the original