Generative AI and intellectual property — Benedict Evans

Apr 25, 2026 · ai · Source ↗

TLDR

Generative AI reframes 500 years of IP debates by turning scale into a new kind of legal and moral problem.

Training data is not stored in LLMs: ChatGPT ingests Common Crawl and other sources but does not retain individual articles, books, or songs.
Any single creator’s work is statistically insignificant in training, yet the system requires collective human output at scale to function at all.
Style imitation sits in legal and moral gray zones: “make a song in the style of Taylor Swift” differs from voice cloning, but no settled consensus exists.
News summaries generated by AI may cross a threshold that link-sharing does not, because AI extracts value at industrial scale rather than sending readers to the source.
The “Large” in LLMs is a diminishing constraint: researchers are already achieving comparable results with far less data, which may render the training-data ownership debate partially moot.

The difference between a cop carrying a wanted photo and face-recognition cameras on every corner is a scale difference that becomes a principle difference, the same logic applies to AI and content use.
A trillion-dollar industry built on collective human output, where each individual contributor is 0.0001% of the input, challenges whether “fair use” is an adequate or even honest framing.
Output ownership is a separate and cleaner question: tools do not make artists, but AI-generated noise could overwhelm discovery systems the way white-noise tracks already game Spotify payouts.

Benedict Evans · ** · Read the original