How Intelligent Is AI, Really?

Name: How Intelligent Is AI, Really?
Uploaded: 2025-12-17T12:00:00.000000Z
Description: ARC Prize president Greg Kamradt explains why ARC-AGI became the standard AGI benchmark and what ARC-AGI v3’s game-based interactive test will reveal. GPT-4 scored 4% on ARC-AGI in 2024; OpenAI o1 jum…

Dec 17, 2025 · ai · Source ↗

Summary based on the YouTube transcript and episode description.

ARC Prize president Greg Kamradt explains why ARC-AGI became the standard AGI benchmark and what ARC-AGI v3’s game-based interactive test will reveal.

GPT-4 scored 4% on ARC-AGI in 2024; OpenAI o1 jumped to 21% on release, signaling the reasoning paradigm shift.
ARC-AGI 1 (2019) had 800 tasks built by François Chollet alone; ARC-AGI 2 launched March 2025 as a harder static version.
ARC-AGI v3 (2026) uses ~150 interactive video-game environments with zero text instructions — models must infer the goal from actions and feedback.
V3 will measure efficiency by action count: AI actions-to-win normalized against average human actions-to-win, not wall-clock time.
OpenAI, xAI (Grok 4), Gemini (3 Pro), and Anthropic (Opus 4.5) now all report ARC-AGI scores in model releases.
Chollet’s position: solving ARC-AGI is necessary but not sufficient for AGI — v3 will be the most authoritative evidence of generalization to date.
Kamradt flags RL-environment gaming as a false positive: tuning on specific RL setups wins benchmarks without achieving real generalization.

2025-12-17 · Watch on YouTube

Related coverage

The Prompt API

EvanFlow – A TDD driven feedback loop for Claude Code

Google banks on AI edge to catch up to cloud rivals Amazon and Microsoft

OpenAI publishes a five-principle framework for AGI development, pledging to resist concentrating AI power and to collaborate with companies and governments