Building Generative Image & Video models at Scale - Sander Dieleman (Veo and Nano Banana)

Name: Building Generative Image & Video models at Scale - Sander Dieleman (Veo and Nano Banana)
Uploaded: 2026-04-21T12:00:00.000000Z
Description: Sander Dieleman (Google DeepMind, Veo/Nano Banana) explains every layer of large-scale diffusion model training, from latent compression to guidance and distillation. 30 seconds of 1080p video at 30fp…

Apr 21, 2026 · video · Source ↗

Summary based on the YouTube transcript and episode description.

Sander Dieleman (Google DeepMind, Veo/Nano Banana) explains every layer of large-scale diffusion model training, from latent compression to guidance and distillation.

30 seconds of 1080p video at 30fps is several gigabytes per training example; latent compression reduces tensor size by up to two orders of magnitude, making training feasible.
Diffusion models are still smaller than frontier LLMs partly because classifier-free guidance lets them punch well above their parameter count in output quality.
Dieleman frames diffusion as spectral autoregression: adding noise removes high frequencies first, so denoising naturally generates images coarse-to-fine, low-to-high frequency.
Guidance amplifies the delta between conditional and unconditional predictions each step; removing it today would reveal how poor current models actually are without it.
Time spent improving data curation often outperforms tweaking the model or optimizer — still underrated and mostly unpublished because it is competitive secret sauce.
Distillation in the diffusion context means fewer sampling steps (consistency models), not a smaller model — one-step consistency sampling rarely works well in practice.
Model style and aesthetic opinions come mostly from post-training (RLHF/DPO), not guidance; guidance artifacts like oversaturation signal a guidance scale set too high.
Google uses JAX with TPUs for model sharding; JAX was designed from the start to minimize chip-to-chip communication automatically.

2026-04-21 · Watch on YouTube