Open Models at Google DeepMind — Cassidy Hardin, Google DeepMind

· ai · Source ↗

Summary based on the YouTube transcript and episode description.

Cassidy Hardin (Google DeepMind) details Gemma 4’s architecture: four model sizes, MoE debut, and on-device multimodal audio support under Apache 2.0.

  • Gemma 4’s 26B MoE activates only 3.8B parameters per forward pass using 8 of 128 experts.
  • Gemma 4 31B dense ranked #3 on LM Arena global leaderboard, outperforming models 20x its size.
  • Both 31B and 26B rank in the top 6 of all open-source models on LM Arena.
  • Effective 2B/4B models use per-layer embeddings (PLE) stored in flash memory, not VRAM, enabling phone/laptop inference.
  • All Gemma 4 models ship under Apache 2.0 license, replacing the prior restrictive license.
  • 31B supports 256k context length with native function calling, thinking, and structured JSON output.
  • Audio support (35M-parameter conformer + MEL spectrogram tokenizer) added to E2B and E4B for on-device speech and translation.
  • Variable aspect ratio and resolution vision encoding replaces Gemma 3’s pan-and-scan multi-image workaround.

2026-04-27 · Watch on YouTube