Google just casually disrupted the open-source AI narrative…

· video · Source ↗

Summary based on the YouTube transcript and episode description.

Fireship explains how Google’s Gemma 4 achieves frontier-class intelligence in a model small enough to run on a single RTX 4090.

  • Gemma 4 is released under Apache 2.0 — genuinely free, not research-only or revenue-restricted like Meta’s Llama license.
  • The 27B parameter Gemma 4 runs locally at ~10 tok/s on one RTX 4090 with a 20 GB download; comparable Kimi K2.5 needs 600+ GB and multiple H100s.
  • Key architectural innovation: per-layer embeddings give each transformer layer its own token representation, reducing wasted information carry-through.
  • Google simultaneously published TurboQuant, a quantization technique that converts weights to polar coordinates and applies Johnson-Lindenstrauss transforms to compress to single sign bits while preserving distances.
  • The real AI bottleneck is memory bandwidth, not compute — Gemma 4 attacks VRAM read cost, not raw parameter count.
  • Gemma 4 scores in the same benchmark range as Kimi K2.5 Thinking despite being roughly 20x smaller in download size.
  • Google is the first FAANG company to release a competitive LLM under a truly open-source license with no commercial restrictions.

2026-04-08 · Watch on YouTube