The Shape of the Thing
TLDR
- AI has shifted from co-intelligence prompting to autonomous agent work, and recursive self-improvement is now an explicit roadmap item at every major lab.
Key Takeaways
- Benchmark scores show near-vertical improvement curves: top AIs now score 94% on Google-Proof Q&A (grad students score 34-70%) and match or exceed human experts 82% of the time on GDPval.
- StrongDM’s three-person team built a Software Factory where AI agents write, test, and ship production code under two rules: no human code, no human review; each engineer spends ~$1,000/day on AI tokens.
- A single week in February 2026 illustrated compounding instability: a fictional AI disruption scenario moved Wall Street, Block announced 40% layoffs citing AI, and a public conflict erupted between the Pentagon and Anthropic over Claude’s use in government.
- Anthropic’s Dario Amodei stated at Davos that engineers inside Anthropic barely write code themselves; OpenAI said its latest Codex model was “instrumental in creating itself.”
- Google DeepMind’s Demis Hassabis confirmed all major labs are actively working to close the recursive self-improvement loop, while flagging missing capabilities and real risks.
Why It Matters
- If recursive self-improvement compounds, the exponential benchmark curves already visible would steepen further, with no clear ceiling established yet.
- Organizations experimenting with radical AI-native workflows right now are setting precedents before norms, regulations, or role models exist.
- Market reactions, job impacts, and government entanglement are already colliding simultaneously; Mollick argues this instability will spread, not stabilize.
Ethan Mollick, One Useful Thing · 2026-03-12 · Read the original