Google just casually disrupted the open-source AI narrative…

Name: Google just casually disrupted the open-source AI narrative…
Uploaded: 2026-04-08T12:00:00.000000Z
Description: Fireship explains how Google’s Gemma 4 achieves frontier-class intelligence in a model small enough to run on a single RTX 4090. Gemma 4 is released under Apache 2.0 — genuinely free, not research-onl…

Apr 8, 2026 · video · Source ↗

Summary based on the YouTube transcript and episode description.

Fireship explains how Google’s Gemma 4 achieves frontier-class intelligence in a model small enough to run on a single RTX 4090.

Gemma 4 is released under Apache 2.0 — genuinely free, not research-only or revenue-restricted like Meta’s Llama license.
The 27B parameter Gemma 4 runs locally at ~10 tok/s on one RTX 4090 with a 20 GB download; comparable Kimi K2.5 needs 600+ GB and multiple H100s.
Key architectural innovation: per-layer embeddings give each transformer layer its own token representation, reducing wasted information carry-through.
Google simultaneously published TurboQuant, a quantization technique that converts weights to polar coordinates and applies Johnson-Lindenstrauss transforms to compress to single sign bits while preserving distances.
The real AI bottleneck is memory bandwidth, not compute — Gemma 4 attacks VRAM read cost, not raw parameter count.
Gemma 4 scores in the same benchmark range as Kimi K2.5 Thinking despite being roughly 20x smaller in download size.
Google is the first FAANG company to release a competitive LLM under a truly open-source license with no commercial restrictions.

2026-04-08 · Watch on YouTube