The Engineering Unlocks Behind DeepSeek | YC Decoded

Name: The Engineering Unlocks Behind DeepSeek | YC Decoded
Uploaded: 2025-02-05T12:00:00.000000Z
Description: YC GP Diana Hu breaks down DeepSeek V3 and R1’s core engineering innovations and why the $5.5M training cost figure is misleading. DeepSeek’s $5.5M figure covers only the final V3 training run, exclud…

Feb 5, 2025 · startups · Source ↗

Watch on YouTube ↗ Summary based on the YouTube transcript and episode description.

YC GP Diana Hu breaks down DeepSeek V3 and R1’s core engineering innovations and why the $5.5M training cost figure is misleading.

DeepSeek’s $5.5M figure covers only the final V3 training run, excluding R&D and hardware costs likely in the hundreds of millions.
V3 uses fp8 training with a periodic fp32 accumulation fix, cutting memory overhead without compounding numerical errors.
Mixture-of-experts architecture activates only 37B of 671B parameters per token—11x fewer than Llama 3’s full 405B activation.
Multi-head latent attention (MLA) compresses KV cache by 93.3% and boosts generation throughput 5.76x, first published in DeepSeek V2 (May 2024).
R1-Zero achieved top-tier reasoning using pure RL with no human or AI reasoning examples—graded only on output accuracy and formatting via GRPO.
A UC Berkeley lab reproduced R1-Zero’s core techniques in a smaller model for just $30.
OpenAI released o3-mini two weeks after R1, outperforming R1 on key benchmarks, signaling rapid acceleration at the frontier.

2025-02-05 · Watch on YouTube

Related coverage

Replit's CEO On The Only Two Jobs Left In The Company Of The Future

Agentic AI: From Assistants to Autonomous Teams

Paid Raises $21M Seed Round to Build the Business Engine for the Agent Economy

Chinese billionaire Chen Tianqiao says his startup MiroMind is adding strict "firewalls" between its Chinese and US businesses, as China blocked Manus' sale