Mozilla used Anthropic’s Mythos model plus a custom agent harness to find 271 Firefox security flaws in two months with almost no false positives.
Key Takeaways
The agent harness, not the model alone, drove results: it wraps the LLM, gives it instructions and tools, and runs it in a loop until completion.
Mozilla gave Mythos access to the same tools and Firefox test builds that human developers use, enabling realistic code analysis.
Earlier AI vulnerability detection attempts produced high rates of hallucinated bug reports requiring significant human triage to discard.
Mozilla Distinguished Engineer Brian Grinstead credited two factors: model improvements and Mozilla’s investment in project-specific harness customization.
Mozilla’s CTO has publicly claimed AI means “zero-days are numbered” and “defenders finally have a chance to win, decisively.”
Hacker News Comment Review
Commenters flagged a key definitional gap: the 271 items are bugs or potential vulnerabilities, not confirmed exploits with proof-of-concept; Mozilla’s own severity breakdown splits them into sec-critical, sec-high, and lower tiers.
One commenter with apparent insider context noted Mythos showed cross-domain reasoning, e.g., identifying that a floating-point value sent over IPC could be attacker-modified in non-obvious ways, suggesting genuine depth beyond pattern matching.
Skepticism exists about what distinguishes Mythos from standard frontier models like Opus for security work; Mozilla has not published a clear benchmark comparison.
Notable Comments
@jerrythegerbil: “Mythos didn’t write 271 PoC for vulnera” – flags that bug counts and verified vulnerabilities are not the same metric.
@canucker2016: Coverity already tracks 5000+ outstanding Firefox defects; overlap with Mythos findings is unknown.