DeepSeek v4 launches two models (flash, pro) with native thinking support, fully compatible with OpenAI and Anthropic SDKs via base_url swap.
Key Takeaways
Drop-in OpenAI/Anthropic replacement: point base_url to api.deepseek.com or api.deepseek.com/anthropic and reuse existing SDK code.
Two production models: deepseek-v4-flash (fast, cheap) and deepseek-v4-pro (frontier); legacy aliases deepseek-chat and deepseek-reasoner deprecated 2026-07-24.
Thinking mode is a first-class API param: pass {"type": "enabled"} plus reasoning_effort: "high" in the request body.
Legacy alias behavior: deepseek-chat maps to v4-flash non-thinking mode; deepseek-reasoner maps to v4-flash thinking mode.
Streaming supported on all models via stream: true.
Hacker News Comment Review
Pricing undercuts the “subsidized inference” argument: pro at $3.48/M output and flash at $0.28/M output led commenters to question whether frontier labs are actually losing money on inference at all.
Weights for the 1.6T-parameter pro base model are live on HuggingFace, which is notable scale for an open release; commenters compared benchmark performance favorably against Opus 4.6.
Security skepticism surfaced quickly: at least one commenter raised the possibility of deliberate backdoors embedded in the weights, invoking the xz-utils supply-chain incident as a reference point.
Notable Comments
@fblp: developer docs shipped before any press release, which commenters read as a signal of engineering-first culture.
@bandrami: alleges weights may contain “xz-level easter eggs” planted with enough lead time for plausible deniability.