CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production
Article
TL;DR
LLM judge sits between agent and outbound HTTP to block malicious requests before they fire.
Key Takeaways
- Judge only sees the HTTP request body — agent has already read secrets before the proxy fires.
- Shared model family between agent and judge means shared prompt injection vulnerabilities.
- Open-sourced by Brex; community consensus is LLM judges are audit layers, not enforcement.
Discussion
Top comments:
- [simonw]: JSON-escaping policy to prevent prompt injection is false security confidence
-
[ArielTM]: Judge and agent from same model family share prompt injection attack surface
If both are Claude, you have shared-vulnerability risk. Prompt-injection patterns that work against one often work against the other. Basic defense in depth says they should at least be different providers.
- [roywiggins]: Agent can prompt-inject the judge through shaped HTTP request bodies
-
[cadamsdotcom]: 99% secure is a failing grade for a security primitive
pointing it at a few days of real traffic produced policies that matched human judgment on the vast majority of held-out requests. The problem is, 99% secure is a failing grade.