The last six months in LLMs in five minutes

· ai ai-agents coding · Source ↗

TLDR

  • Simon Willison’s PyCon US 2026 lightning talk recap covers the November 2025 inflection point where coding agents crossed from “often-work” to “mostly-work” daily-driver quality.

Key Takeaways

  • November 2025 saw the “best model” crown change hands five times across Claude, GPT-5.1, and Gemini 3 within a single month.
  • Coding agents (Codex, Claude Code) became daily drivers after sustained RLVR training through 2025 reduced error rates enough to drop constant human correction.
  • OpenClaw, a “personal AI assistant” (Claw), went from first commit in late November to viral adoption by February; Mac Minis were selling out as local Claw hardware.
  • Gemma 4 is the most capable open-weight model from a US lab; GLM-5.1 is a 1.5TB open-weight Chinese model with strong but hardware-intensive performance.
  • The pelican-on-a-bicycle SVG benchmark tracks model reasoning and code generation quality, not image generation, across each major release.

Hacker News Comment Review

  • The pelican SVG test’s integrity is debated: commenters note Willison’s blog popularity may have already contaminated training data, making “zero training chance” a weaker assumption.
  • There is consensus that the test targets SVG text generation, not raster image quality, which makes raster-to-SVG data laundering a poor training shortcut.

Original | Discuss on HN