I Won a Championship That Doesn't Exist

· ai security web · Source ↗

TLDR

  • A $12 domain registration and one Wikipedia edit were enough to get multiple frontier LLMs to confidently confirm a fabricated 6 Nimmt! world championship.

Key Takeaways

  • The attack is a circular citation loop: register a domain, publish a fake press release, cite it on Wikipedia, and RAG-enabled LLMs treat two sources as independent corroboration.
  • Any LLM with web search inherits the trustworthiness of whatever ranks for a query; SEO poisoning now flows directly into context windows as confident-sounding output.
  • Wikipedia edits that survive long enough get absorbed into pretraining corpora, making the fabrication persistent across every model trained on that scrape even after the edit is reverted.
  • The agent-layer risk is the most serious: agents acting on retrieved vendor policies or external content let a poisoned source specify real actions on real infrastructure.
  • Detectable heuristics exist: Wikipedia edits citing a single external domain registered within the same time window are a clear signal for both Wikipedia editors and training pipeline filters.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN