CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production

· top-stories ai security tools · Source ↗

Article

TL;DR: Brex open-sourced an LLM-as-judge HTTP proxy to approve or block agent API calls in production.

Key Takeaways

  • Natural language policies auto-generated from traffic; matched human judgment on held-out requests
  • Core flaw: judge only sees HTTP body — credential was already read before outbound request
  • Shared-model vulnerability: if judge and agent are both Claude, injection patterns overlap

Discussion

  • Thread consensus: LLM-as-judge is wrong security primitive — non-deterministic and itself injectable
  • Proper role is audit layer on top of real enforcement, not the enforcement layer itself
  • Defense in depth: use different providers for agent and judge, add kernel-level tool controls

Top comments:

  • [simonw]: JSON escaping claim to prevent prompt injection via policy content is false confidence
  • [roywiggins]: It’s fine until the agent starts prompt-injecting the judge itself
  • [ArielTM]: Same model family for agent and judge means shared injection vulnerability surface
  • [cadamsdotcom]: 99% accuracy is a failing grade for a security control

Discuss on HN