Zuckerberg 'Personally Authorized and Encouraged' Meta's Copyright Infringement

· ai books policy · Source ↗

TLDR

  • Five major publishers and Scott Turow sued Meta and Zuckerberg for torrenting 267 TB of pirated material to train Llama, alleging Zuckerberg personally killed a $200M licensing strategy.

Key Takeaways

  • Plaintiffs allege Meta torrented 267 TB from LibGen and pirate sites, copied those works repeatedly to train Llama, and stripped copyright management information to hide sources.
  • In early 2023, Meta considered a $200M dataset licensing budget but abandoned it after escalation to Zuckerberg; an employee noted licensing once would undermine a fair-use defense.
  • Meta had precedent for licensing: signed deals with African-language publishers in 2022 and later with Fox News, CNN, and USA Today – making willful avoidance harder to deny.
  • A Dec. 2023 internal memo flagged LibGen as “a dataset we know to be pirated” and noted Meta “would not disclose” its use, per the complaint.
  • The suit argues deliberate circumvention of copyright protection mechanisms places the conduct outside fair-use provisions, distinguishing it from the June 2025 Chhabria ruling that protected Llama 1 training.

Hacker News Comment Review

  • Commenters note the Anthropic precedent is key: a prior case found that while AI training may be transformative, pirating source material for that purpose is independently infringing, settling for ~$1.5B on ~500K works (~$3K/work).
  • The $750 statutory minimum per infringement across hundreds of millions of works implies potential damages that dwarf any licensing cost Meta avoided, making the scale of exposure a central technical-legal risk.

Notable Comments

  • @ben_w: cites Anthropic settlement of $1.5B (~$3K/work) as the damages benchmark and flags statutory minimum of $750 per infringement on hundreds of millions of works.

Original | Discuss on HN