Terminal-Bench 2.0 SOTA at 82.7%; SWE-Bench Pro 58.6%; Expert-SWE 73.1% on internal long-horizon tasks with a median 20-hour human completion estimate.
Available today to ChatGPT Plus, Pro, Business, Enterprise, and Codex; API access coming very soon with additional safety requirements for scale.
GPT-5.5 Pro targets demanding knowledge work: GDPval 84.9%, OSWorld-Verified 78.7%, Tau2-bench Telecom 98.0% without prompt tuning.
Scientific capability gains: GeneBench and BixBench leading scores; a custom GPT-5.5 harness helped produce a new proof about Ramsey numbers.
Hacker News Comment Review
Skeptics noted the release landed shortly after Claude Opus 4.7, with benchmark selection that happens to favor GPT-5.5; “our smartest model yet” framing drew predictable eye-rolls.
The detail that Codex analyzed its own production traffic and wrote custom heuristics to boost GPU token throughput by 20% attracted more genuine technical interest than the benchmark table.
Practitioners are cautiously optimistic on token efficiency: Opus 4.7’s gains came by using more tokens, not fewer, so GPT-5.5’s opposite trajectory stands out if it holds outside evals.
Notable Comments
@tedsanders: rollout is gradual over many hours, Pro and Enterprise accounts first, then Plus; access may not be live on launch day.
@astlouis44: shared a playable 3D dungeon arena Codex built from a single prompt using TypeScript and Three.js, with GPT-generated environment textures – one of the more concrete capability demos.