Tell NYT, Atlantic, USA Today to keep Wayback Machine

· media ai · Source ↗

TLDR

  • A petition calls on NYT, The Atlantic, and USA Today to stop blocking the Internet Archive’s Wayback Machine from preserving their journalism.

Key Takeaways

  • Since February 2026, the New York Times has instructed the Internet Archive to stop Wayback Machine crawling of its content.
  • USA Today publishes reporting that cites the Wayback Machine while simultaneously blocking it from archiving that same reporting.
  • The Atlantic’s CEO engaged publicly when 100+ journalists sent a celebratory letter but made no commitment to reverse course.
  • Publishers cite AI training concerns as justification; the petition argues those concerns are hypothetical and that AI scrapers ignore robots.txt anyway.
  • The Internet Archive’s compliance with robots.txt is framed as proof of integrity, contrasting it with commercial scrapers that bypass consent entirely.

Hacker News Comment Review

  • Core tension: Internet Archive respects robots.txt and absorbs legal risk, while commercial AI crawlers ignore it and profit – commenters see this as a structural perverse incentive.
  • Practical middle-ground proposals circulated: a 30-day escrow model (similar to Financial Times on NewsBank), a 1-year embargo before archiving, or access-restricted pulls – none currently offered by publishers.
  • Skepticism emerged about petition legitimacy: some argue signers are non-subscribers demanding publishers leave a backdoor open at their own expense.

Notable Comments

  • @ctippett: frames the robots.txt dynamic sharply – “doing the right thing is rewarded with a petition burden while others are rewarded with profit.”
  • @JumpCrisscross: reports that a senior NYT person was unaware of common paywall workarounds discussed on HN; both sides tentatively agreed on 30-day delayed access with pull limits.

Original | Discuss on HN