Chelsea Finn: Building Robots That Can Do Anything

Name: Chelsea Finn: Building Robots That Can Do Anything
Uploaded: 2025-07-22T12:00:00.000000Z
Description: Chelsea Finn explains how Physical Intelligence trained robots to fold laundry, tidy unseen homes, and follow open-ended prompts using a pre-train/fine-tune recipe borrowed from LLMs. Physical Intelli…

Jul 22, 2025 · Source ↗

Summary based on the YouTube transcript and episode description.

Chelsea Finn explains how Physical Intelligence trained robots to fold laundry, tidy unseen homes, and follow open-ended prompts using a pre-train/fine-tune recipe borrowed from LLMs.

Physical Intelligence’s breakthrough: pre-training on all robot data then fine-tuning on a small curated high-quality dataset (borrowed from LLM recipe) unlocked reliable laundry folding after 2-3 months of 0% success rates.
Laundry folding robot went from 20 min for 5 items to 9 min after switching to a 3B-parameter vision-language model (PaliGemma) pre-trained across all robot tasks — 10x larger than prior 100-300M models.
Mobile manipulation data was only 2.4% of the pre-training mix, yet the model generalized to tidying unseen Airbnb kitchens and bedrooms with ~80% task success.
Diverse environments in training data closed the generalization gap almost entirely — performance in novel homes matched performance in seen homes when enough distinct locations were included.
Early models ignored language instructions 80% of the time; stopping gradients from the randomly-initialized diffusion head preserved VLM language-following, flipping the rate to 80% compliance.
Synthetic prompt relabeling — using a VLM to generate hypothetical human prompts for existing robot data — enabled open-ended instruction following (e.g., ‘make me a vegan sandwich, no pickles’) without costly human-robot interaction data collection.
Frontier models (GPT/Claude-class) used as high-level planners scored substantially lower than Physical Intelligence’s trained high-level policy on task progress, due to weak visual grounding in physical contexts.
Finn argues real robot data is irreplaceable for generalization; RL on live robot attempts is the robotics analog to synthetic data in LLM post-training.

2025-07-22 · Watch on YouTube

Related coverage

Replit's CEO On The Only Two Jobs Left In The Company Of The Future

How To Build A Company With AI From The Ground Up

How to Make Claude Code Your AI Engineering Team

How Stripe Built Their New Website