Why We Need Continual Learning

· Source ↗

TLDR

  • LLMs freeze at training; a16z argues models must compress new experience into weights post-deployment, not just retrieve it from external storage.

Key Takeaways

  • Three continual learning approaches exist on a spectrum: context (RAG, agents), modules (adapter layers, compressed KV caches), and full weight updates, each with distinct capability and risk profiles.
  • An 8B model plus a knowledge module can match a 1B-parameter model on targeted tasks, making partial compaction a composable near-term bet.
  • Naive weight updates fail for six engineering reasons including catastrophic forgetting, temporal disentanglement, and logical integration failure, plus four governance problems including alignment degradation and auditability collapse.
  • Medical imaging artifacts, audio cadence, and other high-dimensional tacit knowledge cannot be expressed in text and can only be encoded in weights, not context windows.
  • Research directions with named work: EWC (Kirkpatrick 2017), TTT-Discover (Sun 2020+), MAML, LoRD (Liu 2025), STaR (Zelikman 2022), AlphaEvolve (DeepMind 2025).

Why It Matters

  • The startup opportunity the authors identify is converting user corrections and task outcomes into stable weight updates via RL feedback loops, not bigger RAG pipelines.
  • Ilya Sutskever is quoted predicting deployment will involve continual learning and trial-and-error, signaling frontier lab direction beyond static weights.
  • State space models are framed as extending agentic coherence from ~20 steps to ~20,000, a prerequisite for long-running autonomous agents before parametric learning matures.

Malika Aubakirova and Matt Bornstein, Andreessen Horowitz · 2026-04-22 · Read the original