Simon Willison’s PyCon US 2026 lightning talk recap covers the November 2025 inflection point where coding agents crossed from “often-work” to “mostly-work” daily-driver quality.
Key Takeaways
November 2025 saw the “best model” crown change hands five times across Claude, GPT-5.1, and Gemini 3 within a single month.
Coding agents (Codex, Claude Code) became daily drivers after sustained RLVR training through 2025 reduced error rates enough to drop constant human correction.
OpenClaw, a “personal AI assistant” (Claw), went from first commit in late November to viral adoption by February; Mac Minis were selling out as local Claw hardware.
Gemma 4 is the most capable open-weight model from a US lab; GLM-5.1 is a 1.5TB open-weight Chinese model with strong but hardware-intensive performance.
The pelican-on-a-bicycle SVG benchmark tracks model reasoning and code generation quality, not image generation, across each major release.
Hacker News Comment Review
The pelican SVG test’s integrity is debated: commenters note Willison’s blog popularity may have already contaminated training data, making “zero training chance” a weaker assumption.
There is consensus that the test targets SVG text generation, not raster image quality, which makes raster-to-SVG data laundering a poor training shortcut.