Amazonbot is finally respecting robots.txt

· ai · Source ↗

TLDR

  • Amazon emailed site owners that starting June 15, 2026, Amazonbot will be controlled solely via robots.txt directives, dropping manual opt-out requests.

Key Takeaways

  • Deadline is June 15, 2026; sites not updating robots.txt by then get standard crawl behavior by default.
  • Amazon Publisher Support sent the notice from amazonbot@amazon.com with Outlook for Mac headers visible in the email.
  • The author plans to merge robots.txt Amazonbot rules into Anubis, the bot-mitigation tool originally created because of Amazonbot scraping.
  • Page-, directory-, or site-level control is supported; full spec at developer.amazon.com/amazonbot.

Hacker News Comment Review

  • At least one site confirms Amazonbot was actively ignoring disallowed paths until very recently, and operators resorted to WAF blocklists on AWS infra itself as a workaround.
  • The unprofessional email metadata (Outlook for Mac footer) raised questions about whether it was a bulk BCC or a forwarding alias, pointing to sloppy internal tooling at Amazon Publisher Support.

Notable Comments

  • @jacobn: confirmed Amazonbot scraped disallowed paths on a weather site until blocked at the WAF, while still hosted on AWS – “hosting on their infra & using their services to block their AI scraper”

Original | Discuss on HN