From DevOps ‘Heart Attacks’ to AI-Powered Diagnostics With Traversal’s AI Agents
Watch on YouTube ↗ Summary based on the YouTube transcript and episode description.
Traversal co-founders Anish Agarwal and Raj Agrawal explain how their AI agents cut incident root cause analysis from hours to 2-4 minutes with 90%+ accuracy at enterprise scale.
- Traversal achieves 90%+ root cause accuracy in 2-4 minutes for incidents where the answer exists in observability data, versus hours with 30-50 engineers in Slack.
- Accuracy was stuck at 0% when first deployed at large enterprises; the fix was moving complexity from hardcoded prompts into inference-time compute.
- The causal inference techniques Agarwal used for gene regulatory networks (CRISPR interventions, Broad Institute) map almost identically onto microservice dependency graphs.
- Observability is typically the second-largest software spend after cloud, yet root cause analysis remained fully manual because storage/visualization was all prior technology allowed.
- Enterprises using AI coding tools (Cursor, Windsurf) face a growing debug crisis: AI-written code passes local unit tests but breaks at system interaction points humans can no longer reason about.
- Traversal adds more value at large enterprises than Series A startups because mature observability instrumentation exists but no single engineer holds enough context across thousands of microservices.
- Traversal bet in September 2024 that reasoning models would improve and architected their system to let them shine — that bet has paid off significantly.
- Future logs will be written for LLM consumption, not human scrolling, requiring more information density in message fields than current logging conventions allow.
2025-06-24 · Watch on YouTube