OpenAI vs. Deepseek vs. Qwen: Comparing Open Source LLM Architectures

Name: OpenAI vs. Deepseek vs. Qwen: Comparing Open Source LLM Architectures
Uploaded: 2025-08-29T12:00:00.000000Z
Description: YC’s Ankit Gupta compares GPT OSS, Qwen 3, and DeepSeek V3 architectures, finding similar benchmark results from surprisingly different engineering choices. GPT OSS is OpenAI’s first open-weights mode…

Aug 29, 2025 · ai · Source ↗

Summary based on the YouTube transcript and episode description.

YC’s Ankit Gupta compares GPT OSS, Qwen 3, and DeepSeek V3 architectures, finding similar benchmark results from surprisingly different engineering choices.

GPT OSS is OpenAI’s first open-weights model since GPT-2 in 2019, available at 120B and 20B parameter MoE sizes.
DeepSeek V3 activates only 37B of its 671B parameters per token; V3.1 adds hybrid thinking mode and two-phase long-context training.
Qwen 3 was trained on 36 trillion tokens — twice Qwen 2.5 — including trillions of synthetic tokens generated by prior Qwen models.
Qwen’s RL reasoning stage used only ~4,000 query-verifier pairs, suggesting strong results require far less data than expected.
The three models reach 128K context via different routes: GPT OSS bakes it in at pre-training; DeepSeek stages it via fine-tuning; Qwen applies YaRN scaling at inference without extra retraining.
DeepSeek V3 uses MLA (multi-head latent attention) to compress KV cache into a smaller latent space, outperforming GQA on memory and modeling at scale.
Dataset engineering is likely the core moat: labs reveal architecture but obscure data, making replication hard despite open weights.
Gupta’s meta-observation: top models use broadly the same tools yet achieve similar benchmarks via very different methods — with no first-principles explanation for why any one method wins.

2025-08-29 · Watch on YouTube

Related coverage

Show HN: 49Agents – Infinite canvas IDE for AI agents

Show HN: AgentSwift – Open-source iOS builder agent

To My Students

Claude Pro: Opus model will only be available if extra usage is enabled