Detecting and Preventing Distillation Attacks
https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks-
Three Chinese AI labs ran coordinated distillation attacks on Claude.
- DeepSeek, Moonshot, MiniMax — 24k fraudulent accounts, 16M exchanges total.
- Hydra-cluster proxy networks (up to 20k accounts) bypassed geographic restrictions.
-
MiniMax: 13M exchanges, agentic coding and tool orchestration extraction.
- Pivoted tactics within 24 hours of new Claude model releases.
- Moonshot: 3.4M exchanges targeting reasoning traces, tool use, coding.
- DeepSeek: 150k+ exchanges; chain-of-thought data + censorship-safe query alternatives.
-
Distilled models inherit capability, not safety — bioweapons and cyber ops risk.
- Anthropic flags authoritarian military and surveillance deployment explicitly.
- Anthropic deployed behavioral fingerprinting, detection classifiers, industry indicator sharing.
- “No company can solve this alone” — calls for policy coordination.
X discourse
- @sahilypatel: “Anthropic built two anti-distillation systems into Claude Code: fake tool calls to corrupt scraped data and vague summar” (2037 likes)
- @aidangomez: “okay 😑” (383 likes)
- @MsftSecIntel: “Threat actors jailbreak AI safety controls by reframing requests, chaining instructions to generate restricted content.” (307 likes)
- @commiepommie: “Anthropic accuses DeepSeek of industrial-scale distillation attacks using 24k accounts for 16M exchanges with Claude” (316 likes)
- @ihtesham2005: “AI agent skills leaking API keys via debug print statements; frameworks inject stdout into LLM context, enabling easy re” (201 likes)
Anthropic · ** · Read on anthropic.com
| Type | Link |
| Added | Apr 16, 2026 |