CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production
Article
TL;DR: Brex open-sourced an LLM-as-judge HTTP proxy to approve or block agent API calls in production.
Key Takeaways
- Natural language policies auto-generated from traffic; matched human judgment on held-out requests
- Core flaw: judge only sees HTTP body — credential was already read before outbound request
- Shared-model vulnerability: if judge and agent are both Claude, injection patterns overlap
Discussion
- Thread consensus: LLM-as-judge is wrong security primitive — non-deterministic and itself injectable
- Proper role is audit layer on top of real enforcement, not the enforcement layer itself
- Defense in depth: use different providers for agent and judge, add kernel-level tool controls
Top comments:
- [simonw]: JSON escaping claim to prevent prompt injection via policy content is false confidence
- [roywiggins]: It’s fine until the agent starts prompt-injecting the judge itself
- [ArielTM]: Same model family for agent and judge means shared injection vulnerability surface
- [cadamsdotcom]: 99% accuracy is a failing grade for a security control