A petition calls on NYT, The Atlantic, and USA Today to stop blocking the Internet Archive’s Wayback Machine from preserving their journalism.
Key Takeaways
Since February 2026, the New York Times has instructed the Internet Archive to stop Wayback Machine crawling of its content.
USA Today publishes reporting that cites the Wayback Machine while simultaneously blocking it from archiving that same reporting.
The Atlantic’s CEO engaged publicly when 100+ journalists sent a celebratory letter but made no commitment to reverse course.
Publishers cite AI training concerns as justification; the petition argues those concerns are hypothetical and that AI scrapers ignore robots.txt anyway.
The Internet Archive’s compliance with robots.txt is framed as proof of integrity, contrasting it with commercial scrapers that bypass consent entirely.
Hacker News Comment Review
Core tension: Internet Archive respects robots.txt and absorbs legal risk, while commercial AI crawlers ignore it and profit – commenters see this as a structural perverse incentive.
Practical middle-ground proposals circulated: a 30-day escrow model (similar to Financial Times on NewsBank), a 1-year embargo before archiving, or access-restricted pulls – none currently offered by publishers.
Skepticism emerged about petition legitimacy: some argue signers are non-subscribers demanding publishers leave a backdoor open at their own expense.
Notable Comments
@ctippett: frames the robots.txt dynamic sharply – “doing the right thing is rewarded with a petition burden while others are rewarded with profit.”
@JumpCrisscross: reports that a senior NYT person was unaware of common paywall workarounds discussed on HN; both sides tentatively agreed on 30-day delayed access with pull limits.