The Rise of Generative Media: fal's Bet on Video, Infrastructure, and Speed

· cloud · Source ↗

Summary based on the YouTube transcript and episode description.

fal founders Gorkem Yurtseven, Burkay Gur, and Batuhan Taskaya explain why running 600 video models simultaneously is a harder problem than LLM inference — and why top models turn over every 30 days.

  • A 5-second 24fps video takes ~10,000x the compute of a 200-token LLM prompt; 4K adds another 10x on top.
  • Top 5 video models on fal have a half-life of 30 days — the leaderboard fully turns over that fast.
  • fal’s top 100 customers use 14 different models simultaneously, often chained in multi-step workflows.
  • Video inference is compute-bound (saturating GPU flops), while LLM inference is memory-bandwidth-bound — requiring entirely different kernel optimization strategies.
  • fal runs across 35 heterogeneous data centers with a custom orchestrator and CDN; hyperscalers charge 2–3x more and lack video inference expertise.
  • Batuhan Taskaya became one of the youngest Python core maintainers at 14; fal’s tracing compiler finds common execution patterns and swaps in templated semi-generic kernels at runtime.
  • Jeffrey Katzenberg (ex-DreamWorks CEO) told fal’s generative media conference that AI video is following the same arc as early CGI — initial revolt, then inevitable adoption.
  • Individual creators are spending up to $500K on fal; customers include Canva, Adobe, and Adaptive Security (Brian Long), which generates personalized security training videos on the fly.

2025-12-10 · Watch on YouTube