Why Voice Will Be the Fundamental Interface for Tech ft ElevenLabs’ Mati Staniszewski

· design · Source ↗

Watch on YouTube ↗ Summary based on the YouTube transcript and episode description.

ElevenLabs CEO Mati Staniszewski argues voice will be the default human-computer interface, and his company may pass the voice Turing test in 2025.

  • ElevenLabs aims to pass the voice Turing test — human-indistinguishable voice agents — by end of 2025 or early 2026.
  • ElevenLabs powered Darth Vader in Fortnite via Epic Games, reaching millions of concurrent players in real-time voice conversations.
  • Audio models are smaller and cheaper to train than LLMs, letting ElevenLabs out-compete foundation model labs without a compute advantage.
  • Key gap in audio AI: high-quality labeled data capturing not just transcripts but emotion, tone, and non-verbal cues — requires human voice coaches to train labelers.
  • The company traces every generated audio clip back to the originating account for provenance; working with UC Berkeley on open-source detection models.
  • ElevenLabs co-founders met in Warsaw at 15, spent 15 years as close friends before founding the company — Mati credits this as a core competitive advantage.
  • Cross-lingual real-time voice translation remains Mati’s most underhyped use case — no current device form factor (phone, glasses, headphones) fully enables it yet.
  • Enterprise bottleneck for voice agents is not knowledge bases but system integrations: Twilio, SIP trunking, CRM connectors — ElevenLabs embeds engineers on-site to solve these.

2025-07-01 · Watch on YouTube