From DevOps ‘Heart Attacks’ to AI-Powered Diagnostics With Traversal’s AI Agents

· ai · Source ↗

Watch on YouTube ↗ Summary based on the YouTube transcript and episode description.

Traversal co-founders Anish Agarwal and Raj Agrawal explain how their AI agents cut incident root cause analysis from hours to 2-4 minutes with 90%+ accuracy at enterprise scale.

  • Traversal achieves 90%+ root cause accuracy in 2-4 minutes for incidents where the answer exists in observability data, versus hours with 30-50 engineers in Slack.
  • Accuracy was stuck at 0% when first deployed at large enterprises; the fix was moving complexity from hardcoded prompts into inference-time compute.
  • The causal inference techniques Agarwal used for gene regulatory networks (CRISPR interventions, Broad Institute) map almost identically onto microservice dependency graphs.
  • Observability is typically the second-largest software spend after cloud, yet root cause analysis remained fully manual because storage/visualization was all prior technology allowed.
  • Enterprises using AI coding tools (Cursor, Windsurf) face a growing debug crisis: AI-written code passes local unit tests but breaks at system interaction points humans can no longer reason about.
  • Traversal adds more value at large enterprises than Series A startups because mature observability instrumentation exists but no single engineer holds enough context across thousands of microservices.
  • Traversal bet in September 2024 that reasoning models would improve and architected their system to let them shine — that bet has paid off significantly.
  • Future logs will be written for LLM consumption, not human scrolling, requiring more information density in message fields than current logging conventions allow.

2025-06-24 · Watch on YouTube