Five major publishers and Scott Turow sued Meta and Zuckerberg for torrenting 267 TB of pirated material to train Llama, alleging Zuckerberg personally killed a $200M licensing strategy.
Key Takeaways
Plaintiffs allege Meta torrented 267 TB from LibGen and pirate sites, copied those works repeatedly to train Llama, and stripped copyright management information to hide sources.
In early 2023, Meta considered a $200M dataset licensing budget but abandoned it after escalation to Zuckerberg; an employee noted licensing once would undermine a fair-use defense.
Meta had precedent for licensing: signed deals with African-language publishers in 2022 and later with Fox News, CNN, and USA Today – making willful avoidance harder to deny.
A Dec. 2023 internal memo flagged LibGen as “a dataset we know to be pirated” and noted Meta “would not disclose” its use, per the complaint.
The suit argues deliberate circumvention of copyright protection mechanisms places the conduct outside fair-use provisions, distinguishing it from the June 2025 Chhabria ruling that protected Llama 1 training.
Hacker News Comment Review
Commenters note the Anthropic precedent is key: a prior case found that while AI training may be transformative, pirating source material for that purpose is independently infringing, settling for ~$1.5B on ~500K works (~$3K/work).
The $750 statutory minimum per infringement across hundreds of millions of works implies potential damages that dwarf any licensing cost Meta avoided, making the scale of exposure a central technical-legal risk.
Notable Comments
@ben_w: cites Anthropic settlement of $1.5B (~$3K/work) as the damages benchmark and flags statutory minimum of $750 per infringement on hundreds of millions of works.