GPT-5.5: Mythos-Like Hacking, Open to All

· ai ai-agents coding · Source ↗

TLDR

  • XBOW tested GPT-5.5 early access across offensive security benchmarks and found it reaches Mythos-level hacking performance, now publicly available.

Key Takeaways

  • XBOW is a security-focused AI company that had select early access to GPT-5.5 and ran it through their full offensive security benchmark suite and operational workflows.
  • “Mythos-like” performance is the headline claim: GPT-5.5 reportedly matches a top-tier hacking agent benchmark on offensive tasks.
  • Benchmark coverage includes web vulnerability discovery in open source software (OSS), a concrete and reproducible attack surface for comparison.
  • The post includes multi-model comparison plots, with models like Claude Opus 4.7 appearing as reference points alongside GPT-5.5.
  • Public availability is the key deployment shift: this is not a research preview but a production model accessible to all users.

Hacker News Comment Review

  • The single substantive comment raises a sharp methodological critique: the benchmark visualizations use line charts for categorical model comparisons, a chart type that implies continuity between discrete categories and misleads readers.
  • A specific flaw flagged: the “Web Vulns in OSS” plot has no white-box data for Opus 4.7, but the connecting line visually implies a value near 60, potentially inflating perceived GPT-5.5 gains by distorting the baseline.

Notable Comments

  • @nsingh2: “the absurd connecting line implies it should be near 60” – calls out missing Opus 4.7 white-box data being visually fabricated by line chart interpolation.

Original | Discuss on HN