Introducing gpt-oss-safeguard
https://openai.com/index/introducing-gpt-oss-safeguard/-
OpenAI releases open-weight safety classifier: 120B and 20B params.
- Apache 2.0, downloadable from Hugging Face.
- Fine-tuned versions of gpt-oss open models.
-
Chain-of-thought reasoning classifies content against developer-specified policies.
- Policy supplied at inference time, not baked into training.
-
Beats GPT-5-thinking on internal multi-policy evals.
- Underperforms dedicated classifiers trained on 10K+ labeled samples.
-
Best fit: emerging harms, nuanced domains, low labeled-data regimes.
- Also preferred when explainability > latency.
- High compute cost limits scale; not a drop-in for bulk content moderation.
-
ROOST community partnership for open safety model ecosystem.
- Early testers: SafetyKit, Tomoro, Discord specialists.
- r/accelerate: “Trusted access for the next era of cyber defense | OpenAI” (20 pts, 1 comment)
· ** · Read on openai.com
| Type | Link |
| Added | Apr 21, 2026 |