Where's the raccoon with the ham radio? (ChatGPT Images 2.0)

· ai · Source ↗

TLDR

  • OpenAI’s gpt-image-2 outperforms gpt-image-1 and Google’s Nano Banana 2 on complex Where’s Waldo-style illustration prompts, with high-quality output at 3840x2160 costing ~$0.40 per image.

Key Takeaways

  • gpt-image-1 failed to place a findable raccoon in the scene; gpt-image-2 at default quality also missed, but at outputQuality: high and 3840x2160 produced a clearly visible raccoon bottom-left.
  • Google’s Nano Banana 2 placed the raccoon prominently at an “Amateur Radio Club” booth with a “W6HAM” callsign detail; Nano Banana Pro via AI Studio produced the worst result of any model tested.
  • The high-quality gpt-image-2 run consumed 13,342 output tokens billed at $30/million, totaling roughly 40 cents for a 17MB PNG (converted to 5MB WEBP).
  • The OpenAI Python client library does not validate model IDs, so gpt-image-2 can be used before the library is officially updated.
  • Models asked to locate objects in their own generated images hallucinated confident but wrong answers, confirmed via a Hacker News follow-up showing a misplaced red circle.

Why It Matters

  • Complex illustration benchmarks (hidden objects, embedded text, scene density) expose quality gaps between image models that simpler prompts do not surface.
  • Output token pricing at $30/million means high-resolution generation can cost meaningful money at scale; 40 cents per 4K image adds up fast in production pipelines.
  • Model self-verification is unreliable: a model that generated an image could not accurately locate objects within it, undermining any auto-QA loop that asks the generator to check its own output.

Simon Willison, Simon Willison’s Weblog · 2026-04-21 · Read the original