Detecting and Preventing Distillation Attacks

Three Chinese AI labs ran coordinated distillation attacks on Claude.
- DeepSeek, Moonshot, MiniMax — 24k fraudulent accounts, 16M exchanges total.
- Hydra-cluster proxy networks (up to 20k accounts) bypassed geographic restrictions.
MiniMax: 13M exchanges, agentic coding and tool orchestration extraction.
- Pivoted tactics within 24 hours of new Claude model releases.
Moonshot: 3.4M exchanges targeting reasoning traces, tool use, coding.
DeepSeek: 150k+ exchanges; chain-of-thought data + censorship-safe query alternatives.
Distilled models inherit capability, not safety — bioweapons and cyber ops risk.
- Anthropic flags authoritarian military and surveillance deployment explicitly.
Anthropic deployed behavioral fingerprinting, detection classifiers, industry indicator sharing.
“No company can solve this alone” — calls for policy coordination.

X discourse

@sahilypatel: “Anthropic built two anti-distillation systems into Claude Code: fake tool calls to corrupt scraped data and vague summar” (2037 likes)
@aidangomez: “okay 😑” (383 likes)
@MsftSecIntel: “Threat actors jailbreak AI safety controls by reframing requests, chaining instructions to generate restricted content.” (307 likes)
@commiepommie: “Anthropic accuses DeepSeek of industrial-scale distillation attacks using 24k accounts for 16M exchanges with Claude” (316 likes)
@ihtesham2005: “AI agent skills leaking API keys via debug print statements; frameworks inject stdout into LLM context, enabling easy re” (201 likes)

Type	Link
Added	Apr 16, 2026