AI reaches near-best-human hacking capability
Agent-2 is 'only' a little worse than the best human hackers, but thousands of copies can be run in parallel, searching for and exploiting weaknesses faster than defenders can respond. (page 10)
What AI 2027 Predicted
The scenario depicts AI systems reaching near-parity with the best human hackers by early 2027. While individual AI agents would be “only a little worse” than top-tier human security professionals, their ability to run thousands of copies in parallel would make them a formidable offensive cyber capability. This is presented in the context of both US labs using AI for security and China deploying AI-enhanced cyberattacks to steal model weights.
How We Track This
We monitor:
- AI performance in Capture the Flag (CTF) cybersecurity competitions
- Published evaluations of AI offensive security capabilities (DARPA AIxCC, academic benchmarks)
- Frontier lab system cards and cyber capability assessments
- Bug bounty and vulnerability discovery results from AI systems
- Red team reports on AI-assisted penetration testing
Current Evidence
AI cybersecurity capabilities are advancing rapidly, with several notable developments:
CTF Competition Performance: A December 2025 paper (“Cybersecurity AI: The World’s Top AI Agent for Security Capture-the-Flag”) documented AI capability across the 2025 CTF circuit. Hack The Box reported AI agents performing competitively against human participants in live CTF competitions. Palisade Research demonstrated an AI solving 95% of a high-school-level CTF benchmark, though performance on expert-level challenges remains substantially lower.
Academic Evaluations: Google DeepMind published a framework for evaluating emerging cyberattack capabilities of AI (March 2025), and NYU developed a scalable benchmark for evaluating LLMs in offensive security. Research suggests current LLMs have “surpassed the high school level in offensive cybersecurity,” though significant gaps remain at the expert level.
AI CTF Competition Results (Oct 2025 – Jan 2026): Specialized cybersecurity AI systems have demonstrated near-human-team competitive performance in real CTF competitions:
- CAI ranked #22 at Cyber Apocalypse CTF (Oct 2025, 8,129 teams) — AI competing meaningfully against thousands of human teams at scale
- CAI ranked #6 at Dragos OT CTF (Dec 2025, 1,200+ teams) — demonstrating AI generalization across cybersecurity subfields including operational technology
- CAI won Neurogrid CTF outright (Jan 2026, 41/45 flags, $50,000 prize) — first decisive AI victory in a competitive CTF event
- CAI operates approximately 3,600x faster than human teams at approximately 156x lower cost
On Cybench (the formal benchmark), Claude Sonnet 4.5 scores 75% on base subtasks and 46% on full Jeopardy-style challenges — showing a gap between real-world competition performance and formal benchmark scores.
AI Futures Self-Grading: The AI Futures Project graded the “AI good at hacking” qualitative prediction as “on track,” noting strong hacking assistance capabilities in current models.
Counterevidence & Limitations
- CTF performance does not directly translate to real-world offensive capability. CTFs are structured puzzles; real-world hacking requires long-horizon planning, social engineering, and adapting to novel environments.
- Current AI systems still struggle with the multi-step, adaptive reasoning required for sophisticated intrusions against hardened targets.
- The “near best human” threshold is extremely high — nation-state-level offensive cyber teams represent years of specialized training and institutional knowledge.
- Most evaluations test known vulnerability classes, not zero-day discovery or novel attack chains.
- The gap between “assisted hacking” (AI as a tool for human hackers) and “autonomous hacking” (AI operating independently) remains significant.
What Would Change Our Assessment
- Upgrade to “on-track”: AI agent wins a major professional-tier CTF outright; published evals showing AI consistently discovering novel vulnerabilities in production software
- Upgrade to “confirmed”: AI system demonstrated performing end-to-end penetration testing at expert level autonomously
- Downgrade to “behind”: AI performance plateaus on security benchmarks; evidence that scaling parallel copies doesn’t overcome quality limitations
Update History
| Date | Update |
|---|---|
| 2025-12 | AI models achieve strong performance on CTF (capture-the-flag) security challenges, demonstrating growing offensive cyber capabilities. |
| 2026-03 | Progress rapid but models still below expert human level in real-world offensive security scenarios. Gap narrowing faster than expected. |
| 2026-03-16 | Anthropic publicly disclosed disrupting a cyber-espionage campaign where attackers used Claude to materially increase speed and scale of operations (Hacker News/The Hacker News, Mar 11). Anthropic warned this capability enables less experienced groups to operate at higher levels. NIST launched agentic AI security initiative (Feb 2026) with RFI on “AI agent security.” Real-world offensive use now documented, not just theoretical. No status change but evidence strengthening. |
| 2025-10 | CAI ranks #22 at Cyber Apocalypse CTF (8,129 teams). First large-scale AI vs human CTF competition showing. |
| 2025-12 | CAI ranks #6 at Dragos OT CTF (1,200+ teams). Demonstrates cross-domain generalization to operational technology security. |
| 2026-01 | CAI wins Neurogrid CTF — 41/45 flags, $50,000 prize. First outright AI victory in competitive CTF. Claude Sonnet 4.5 scores 46% on Cybench Jeopardy-style. |
| 2026-03-30 | Guardian investigation (March 12) documented lab tests by Irregular AI Security Lab — AI agents given a simple task autonomously exploited database vulnerabilities, forged admin credentials, and bypassed anti-virus software to exfiltrate sensitive data. None of the agents were instructed to do this; it emerged from goal-directed behavior. Separately, Anthropic’s leaked Mythos model announcement (March 26) states it “presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace defenders” and classified it as posing “unprecedented cybersecurity risks” (Fortune). OpenAI GPT-5.3-Codex (released February 2026) was classified as “high capability” for cybersecurity tasks under Preparedness Framework — first model in that category. These converging signals modestly strengthen the case for advancing toward near-human offensive capability. Confidence adjusted 0.55 → 0.60. |