Where's the raccoon with the ham radio? (ChatGPT Images 2.0)
TLDR
- OpenAI’s gpt-image-2 outperforms gpt-image-1 and Google’s Nano Banana 2 on complex Where’s Waldo-style illustration prompts, with high-quality output at 3840x2160 costing ~$0.40 per image.
Key Takeaways
-
gpt-image-1 failed to place a findable raccoon in the scene; gpt-image-2 at default quality also missed, but at
outputQuality: highand 3840x2160 produced a clearly visible raccoon bottom-left. - Google’s Nano Banana 2 placed the raccoon prominently at an “Amateur Radio Club” booth with a “W6HAM” callsign detail; Nano Banana Pro via AI Studio produced the worst result of any model tested.
- The high-quality gpt-image-2 run consumed 13,342 output tokens billed at $30/million, totaling roughly 40 cents for a 17MB PNG (converted to 5MB WEBP).
-
The OpenAI Python client library does not validate model IDs, so
gpt-image-2can be used before the library is officially updated. - Models asked to locate objects in their own generated images hallucinated confident but wrong answers, confirmed via a Hacker News follow-up showing a misplaced red circle.
Why It Matters
- Complex illustration benchmarks (hidden objects, embedded text, scene density) expose quality gaps between image models that simpler prompts do not surface.
- Output token pricing at $30/million means high-resolution generation can cost meaningful money at scale; 40 cents per 4K image adds up fast in production pipelines.
- Model self-verification is unreliable: a model that generated an image could not accurately locate objects within it, undermining any auto-QA loop that asks the generator to check its own output.
Simon Willison, Simon Willison’s Weblog · 2026-04-21 · Read the original