Why Scale Will Not Solve AGI | Vishal Misra - The a16z Show
https://www.youtube.com/watch?v=zwDmKsnhl08Columbia’s Vishal Misra proves transformers do exact Bayesian updating — then explains why that still can’t get us to AGI
- Transformers proven to do Bayesian posterior updating to 10^-3 bits precision — not resemblance, mathematical proof via ‘Bayesian wind tunnel’ with blank architectures on tasks impossible to memorize.
- Scale will not solve AGI. Two missing pieces: (1) plasticity via continual learning and (2) moving from correlation to causation. More tokens and compute don’t close either gap.
- LLM weights freeze post-training; each conversation starts at zero. Human synapses stay plastic for life — that’s the architectural gap, not a data gap.
- Deep learning operates in Shannon entropy world (correlation). AGI requires Kolmogorov complexity (shortest program that reproduces reality) — no practical algorithm exists yet for the latter.
- The Einstein test for AGI: train an LLM on pre-1911 physics and ask it to derive relativity. It won’t — LLMs can’t generate new manifolds, only navigate the one they were trained on.
- Dario Hassabis allegedly said LLMs’ consciousness ‘can’t be ruled out.’ Misra: ‘grains of silicon doing matrix multiplication — they don’t have consciousness, they don’t have an inner monologue.’
- Knuth’s viral Hamiltonian cycles result validates the Shannon/Kolmogorov gap: LLMs found the pieces by brute search, but Knuth’s brain had to create the new causal representation that closed the proof.
- Architecture taxonomy from wind tunnel papers: Transformer > Mamba > LSTM > MLP on Bayesian updating. Transformer nails all tasks; MLPs fail completely.
- Misra built the first RAG implementation in 2020 at ESPN’s Cricket Info (Stats Guru), translating natural language to a custom DSL using GPT-3 with 2,000-token context — before the term RAG existed.
- Google Research independently published a paper teaching LLMs Bayesian learning via RLHF, and outside researchers reproduced Misra’s Bayesian wind tunnel experiments — both validating the framework.
Guests: Vishal Misra (Columbia CS professor, networking researcher), Martin Casado (a16z general partner) · 2026-03-17 · Watch on YouTube
| Type | Link |
| Added | Mar 17, 2026 |
| Modified | Apr 16, 2026 |