Can a Language Model Paint?

· ai · Source ↗

TLDR

  • Builder iteratively prompts VLMs to paint stroke-by-stroke using Claude Opus/Sonnet and Mistral Large, testing whether process-driven generation feels more artistically sincere than one-shot output.

Key Takeaways

  • Frontier models (Claude Opus 4.6/4.7) produce recognizable images; smaller or older models like Mistral Large trend toward abstract scribbles unless given 50-stroke batches instead of 5.
  • VLMs fail at iterative fine-detail: a single bad stroke triggers cascading destruction, and the model then makes increasingly destructive repair attempts.
  • This mirrors LLM-assisted codebases: broad-stroke output is competent, but iterative fine edits near capability limits degrade the whole structure irreversibly.
  • The CLI app passes current canvas plus concept to a VLM loop; the model reasons per stroke and self-terminates when it judges the painting complete or hits a max stroke limit.
  • Output is still described by the author as “soulless derivative digital illustration” – iterative process did not produce the sincerity the experiment sought.

Hacker News Comment Review

  • Discussion is thin; the main concrete observation from commenters is that stroke-by-stroke LLM output looks more human-made than diffusion model output.
  • The imitation-vs-creation framing surfaced briefly but was not developed with technical depth.
  • A related resource was flagged: Simon Willison’s pelican-on-bicycle LLM progress tracker and Sam Collins’s “underdrawings” post on accurate text generation are relevant prior art.

Notable Comments

  • @bizer: notes iterative paintings look more human-made than diffusion output – a concrete perceptual distinction worth testing further.
  • @baCist: “LLMs can draw… but they imitate, not create” – sharp framing of the ceiling the experiment hits.

Original | Discuss on HN