CrabTrap: An LLM-as-a-judge HTTP proxy to secure agents in production

· ai security tools · Source ↗

Article

TL;DR

LLM judge sits between agent and outbound HTTP to block malicious requests before they fire.

Key Takeaways

  • Judge only sees the HTTP request body — agent has already read secrets before the proxy fires.
  • Shared model family between agent and judge means shared prompt injection vulnerabilities.
  • Open-sourced by Brex; community consensus is LLM judges are audit layers, not enforcement.

Discussion

Top comments:

  • [simonw]: JSON-escaping policy to prevent prompt injection is false security confidence
  • [ArielTM]: Judge and agent from same model family share prompt injection attack surface

    If both are Claude, you have shared-vulnerability risk. Prompt-injection patterns that work against one often work against the other. Basic defense in depth says they should at least be different providers.

  • [roywiggins]: Agent can prompt-inject the judge through shaped HTTP request bodies
  • [cadamsdotcom]: 99% secure is a failing grade for a security primitive

    pointing it at a few days of real traffic produced policies that matched human judgment on the vast majority of held-out requests. The problem is, 99% secure is a failing grade.

Discuss on HN