The Rise of Generative Media: fal's Bet on Video, Infrastructure, and Speed
fal founders Gorkem Yurtseven, Burkay Gur, and Batuhan Taskaya explain why running 600 video models simultaneously is a harder problem than LLM inference — and why top models turn over every 30 days.
- A 5-second 24fps video takes ~10,000x the compute of a 200-token LLM prompt; 4K adds another 10x on top.
- Top 5 video models on fal have a half-life of 30 days — the leaderboard fully turns over that fast.
- fal’s top 100 customers use 14 different models simultaneously, often chained in multi-step workflows.
- Video inference is compute-bound (saturating GPU flops), while LLM inference is memory-bandwidth-bound — requiring entirely different kernel optimization strategies.
- fal runs across 35 heterogeneous data centers with a custom orchestrator and CDN; hyperscalers charge 2–3x more and lack video inference expertise.
- Batuhan Taskaya became one of the youngest Python core maintainers at 14; fal’s tracing compiler finds common execution patterns and swaps in templated semi-generic kernels at runtime.
- Jeffrey Katzenberg (ex-DreamWorks CEO) told fal’s generative media conference that AI video is following the same arc as early CGI — initial revolt, then inevitable adoption.
- Individual creators are spending up to $500K on fal; customers include Canva, Adobe, and Adaptive Security (Brian Long), which generates personalized security training videos on the fly.
2025-12-10 · Watch on YouTube