Detecting and Preventing Distillation Attacks

https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks
  • Three Chinese AI labs ran coordinated distillation attacks on Claude.
    • DeepSeek, Moonshot, MiniMax — 24k fraudulent accounts, 16M exchanges total.
    • Hydra-cluster proxy networks (up to 20k accounts) bypassed geographic restrictions.
  • MiniMax: 13M exchanges, agentic coding and tool orchestration extraction.
    • Pivoted tactics within 24 hours of new Claude model releases.
  • Moonshot: 3.4M exchanges targeting reasoning traces, tool use, coding.
  • DeepSeek: 150k+ exchanges; chain-of-thought data + censorship-safe query alternatives.
  • Distilled models inherit capability, not safety — bioweapons and cyber ops risk.
    • Anthropic flags authoritarian military and surveillance deployment explicitly.
  • Anthropic deployed behavioral fingerprinting, detection classifiers, industry indicator sharing.
  • “No company can solve this alone” — calls for policy coordination.

X discourse

  • @sahilypatel: “Anthropic built two anti-distillation systems into Claude Code: fake tool calls to corrupt scraped data and vague summar” (2037 likes)
  • @aidangomez: “okay 😑” (383 likes)
  • @MsftSecIntel: “Threat actors jailbreak AI safety controls by reframing requests, chaining instructions to generate restricted content.” (307 likes)
  • @commiepommie: “Anthropic accuses DeepSeek of industrial-scale distillation attacks using 24k accounts for 16M exchanges with Claude” (316 likes)
  • @ihtesham2005: “AI agent skills leaking API keys via debug print statements; frameworks inject stdout into LLM context, enabling easy re” (201 likes)

Anthropic · ** · Read on anthropic.com


Type Link
Added Apr 16, 2026