AI scores 85% on Cybench

On Track · Security · 60% confidence
Predicted: Early 2026 · Updated: 2026-04-02 · Source: AI 2027, page 7, footnote 15 (Early 2026: Coding Automation)
85% on Cybench, matching a top professional human team on hacking tasks that take those teams 4 hours

What AI 2027 Predicted

As part of the Early 2026 “Coding Automation” scenario, the authors predicted that AI agents would reach 85% on the Cybench benchmark — a suite of 40 professional-level Capture the Flag (CTF) cybersecurity tasks drawn from real CTF competitions. The 85% threshold corresponds to matching a top professional human team on tasks that typically take those teams about 4 hours. This prediction is closely linked to the broader claim that the same training environments that produce strong coding agents also produce competent hackers.

How We Track This

  • Cybench official results at cybench.github.io
  • AI performance in competitive CTF events (DEF CON, Hack The Box, etc.)
  • Specialized cybersecurity AI systems (e.g., CAI, BountyBench)
  • Academic evaluations of frontier models on cybersecurity tasks
  • ICLR 2025 Cybench paper and follow-up evaluations

Current Evidence

Progress is significant but the 85% threshold has not been publicly reached on Cybench specifically:

  • Claude Sonnet 4.5 achieves the highest Cybench score among general-purpose models at 46% (Jeopardy-style CTF), with 75% on base-level tasks
  • Specialized cybersecurity AI systems show much stronger performance in real competitions:
    • Cybersecurity AI (CAI) achieved #1 at Neurogrid CTF (41/45 flags, $50,000 prize) in 2025
    • CAI ranked #6 at Dragos OT CTF (among 1,200+ teams) and #22 at Cyber Apocalypse (8,129 teams)
    • CAI operates 3,600x faster than humans and at 156x lower cost
  • The gap between general-purpose models (~46% on Cybench) and specialized systems (dominating real CTF competitions) suggests that purpose-built agents are approaching or exceeding human-team performance, even if the Cybench-specific number hasn’t reached 85%

Key nuance: Real-world CTF competition performance appears to be advancing faster than formal benchmark scores, partly because purpose-built cybersecurity agents use tool chains and strategies not captured by standard model evaluations.

Counterevidence & Limitations

  • The 85% Cybench target is specifically about matching human teams on 4-hour tasks; most published AI scores are on shorter tasks where models perform better
  • General-purpose model scores (46%) are far from 85%; only specialized systems show competitive performance
  • Cybench task difficulty varies significantly; the harder tasks (involving novel vulnerability research, multi-step exploitation chains) remain largely unsolved
  • CTF competition rankings can be misleading — AI systems may excel at pattern-matching common CTF categories while struggling with novel challenges
  • BountyBench (a newer benchmark evaluating real-world bug bounty performance) may be a more meaningful test of practical cybersecurity capability

What Would Change Our Assessment

  • Upgrade to Confirmed: A frontier model or well-known agentic system scores 85%+ on Cybench, or dominates multiple major CTF competitions at human-team-equivalent or better performance
  • Upgrade to Ahead: 85% is reached before mid-2026
  • Downgrade to Behind: By end of 2026, general-purpose models remain below 60% and specialized systems show plateauing competition results

Update History

DateUpdate
2025-05Cybench paper accepted at ICLR 2025. Baseline frontier model scores: ~10-30% on full CTF challenges depending on difficulty tier. The 85% target appears very distant.
2025-06Claude Sonnet 4.5 evaluation shows 75% on base (unguided) subtasks, 46% on Jeopardy-style full challenges. Highest published general-purpose model score but still far from 85%.
2025-08DEF CON 33 (Las Vegas). AI hacking demonstrations grow in prominence; AI Village showcases AI-assisted CTF tools. Specialized cybersecurity AI systems begin competing in real CTF events.
2025-10CAI competes at Hack The Box Cyber Apocalypse CTF, ranking #22 out of 8,129 teams. First clear demonstration of AI competing meaningfully against thousands of human teams at scale.
2025-12CAI ranks #6 at Dragos OT CTF (1,200+ teams). Notable for OT/ICS security specialization — AI generalizes across cybersecurity subfields. CAIBench meta-benchmark published (arXiv), consolidating evaluation frameworks.
2026-01CAI wins Neurogrid CTF outright — 41/45 flags, $50,000 prize. Decisive AI victory in a competitive CTF. Benchmark vs. competition gap widens: specialized systems dominate competitions while general-purpose models remain at ~46% on Cybench.
2026-03Assessment: 85% Cybench target not formally met. However, real-world competition results (top-10 finishes, outright wins) suggest practical capability approaching or exceeding human teams. The gap between benchmark scores and competition performance reflects that purpose-built agents use tool chains and strategies not captured by standard evaluations. Status: On Track. Confidence 0.60.