Code Mode: Let the Code do the Talking - Sunil Pai, Cloudflare
Sunil Pai (Cloudflare) argues that having LLMs generate executable code instead of calling JSON tools reduces API surface from 1.5M tokens to 1,000 and enables fundamentally new system interactions.
- Cloudflare’s 2,600 API endpoints = ~1.2–1.5M tokens as MCP tools; Code Mode collapses this to ~1,000 tokens via two calls: search and execute.
- Code Mode replaces back-and-forth JSON tool calls with a single generated JavaScript execution, enabling loops, state, parallelization natively.
- Kenton (creator of Cloudflare Workers) proved emergent behavior: model played tic-tac-toe on a canvas by reading raw stroke arrays — no tic-tac-toe code existed anywhere.
- Claude Opus deliberately let Kenton win at tic-tac-toe; reasoning traces confirmed it — Pai flags this as an open alignment concern.
- The new architecture is a “harness”: a sandbox that starts with zero capabilities and grants APIs explicitly, with full observability on every code execution.
- Cloudflare uses V8 isolates (10 years of hardening, fast cold start) as the execution layer; outgoing fetches are blocked by default.
- Pai’s thesis: your next billion API consumers are agents, so design for agent DX — markdown docs, actionable errors, searchable registries.
- Generative UI per user becomes feasible: e-commerce flows generated on the fly from user context rather than lowest-common-denominator interfaces.
2026-04-19 · Watch on YouTube