Why Voice Will Be the Fundamental Interface for Tech ft ElevenLabs’ Mati Staniszewski
Watch on YouTube ↗ Summary based on the YouTube transcript and episode description.
ElevenLabs CEO Mati Staniszewski argues voice will be the default human-computer interface, and his company may pass the voice Turing test in 2025.
- ElevenLabs aims to pass the voice Turing test — human-indistinguishable voice agents — by end of 2025 or early 2026.
- ElevenLabs powered Darth Vader in Fortnite via Epic Games, reaching millions of concurrent players in real-time voice conversations.
- Audio models are smaller and cheaper to train than LLMs, letting ElevenLabs out-compete foundation model labs without a compute advantage.
- Key gap in audio AI: high-quality labeled data capturing not just transcripts but emotion, tone, and non-verbal cues — requires human voice coaches to train labelers.
- The company traces every generated audio clip back to the originating account for provenance; working with UC Berkeley on open-source detection models.
- ElevenLabs co-founders met in Warsaw at 15, spent 15 years as close friends before founding the company — Mati credits this as a core competitive advantage.
- Cross-lingual real-time voice translation remains Mati’s most underhyped use case — no current device form factor (phone, glasses, headphones) fully enables it yet.
- Enterprise bottleneck for voice agents is not knowledge bases but system integrations: Twilio, SIP trunking, CRM connectors — ElevenLabs embeds engineers on-site to solve these.
2025-07-01 · Watch on YouTube