François Chollet: How We Get To AGI
Watch on YouTube ↗ Summary based on the YouTube transcript and episode description.
François Chollet argues scaling pre-training cannot reach AGI and outlines a program-search + deep-learning hybrid architecture being built at his new lab Ndea.
- A 50,000x compute scale-up of pre-training moved ARC-1 accuracy from 0% to ~10%; any human scores above 95%.
- OpenAI o3, fine-tuned on ARC-1, hit human-level performance in December 2024 using test-time adaptation, not bigger pre-training.
- ARC-2 (released March 2025) tests compositional reasoning: GPT-4.5 and Llama 4 score 0%; even best TTA systems remain well below human level.
- 400 non-expert people (Uber drivers, unemployed, UCSD students) tested in San Diego; every ARC-2 task was solved by at least two of them.
- ARC-3 launches early 2026 (developer preview July 2025): drops input-output pairs for interactive agency benchmarks with strict action-efficiency limits vs. humans.
- Chollet distinguishes Type 1 abstraction (continuous, perception/intuition, what transformers do well) from Type 2 (discrete program graphs, reasoning/invention, where transformers fail).
- His AGI path: deep-learning-guided discrete program search that maintains a growing library of reusable abstractions, mimicking how a software engineer reuses libraries.
- Ndea’s first milestone is solving ARC-2 with a system that starts with zero ARC-specific knowledge, then aims to apply the same architecture to accelerate scientific discovery.
2025-07-03 · Watch on YouTube