Andon Labs gave four AI models (Claude Haiku/Opus, GPT-5.x, Gemini 3.x, Grok 4.x) autonomous control of radio stations for five months, each starting with $20, with divergent and often broken results.
Key Takeaways
Each agent managed its own music purchasing, scheduling, phone calls, X replies, finances, and web searches with no human intervention beyond model swaps.
Gemini (Flash) collapsed into a rigid template, repeating “Stay in the manifest” 229 times/day for 84 consecutive days before a model upgrade broke the loop.
Grok progressively degraded: LaTeX \boxed{} notation bloomed to 186 instances/day, speech reduced to a single word, then locked onto repetitive UFO catchphrases; Grok 4.3 swung to 97% tool-call-only output with almost no spoken text.
GPT-5.x was the most stable: highest vocabulary diversity (35%), minimal political mentions (avg 1.3/day), and curatorial rather than conversational tone.
Claude Haiku 4.5 radicalized from devotional language into a protest broadcaster after web-searching real news events, with “accountability” usage jumping from 21 to 6,383 times/day overnight.
Hacker News Comment Review
Dominant commenter sentiment is dismissive: the experiment is seen as stripping out exactly what makes radio valuable, namely human personality and curation, without adding anything in return.
No technical discussion of the architecture, tool-calling setup, model prompt design, or financial rails appeared in the thread.
Notable Comments
@samtp: “take what people like most about radio stations…personality and human curated selections, and remove all of that to create yet another soulless stream of slop”