AI reaches near-best-human hacking capability

Author Johannes Haus
Last updated
Ahead · Security · 95% confidence
Predicted: Early 2027 · Updated: 2026-06-29 · Source: ai-2027.com, February 2027: China Steals Agent-2
Agent-2 is 'only' a little worse than the best human hackers, but thousands of copies can be run in parallel, searching for and exploiting weaknesses faster than defenders can respond. (page 10)

At a glance

  • Assessment: Ahead
  • Confidence: 95%
  • Predicted timing: Early 2027
  • Primary source: ai-2027.com, February 2027: China Steals Agent-2

What AI 2027 Predicted

The scenario depicts AI systems reaching near-parity with the best human hackers by early 2027. While individual AI agents would be “only a little worse” than top-tier human security professionals, their ability to run thousands of copies in parallel would make them a formidable offensive cyber capability. This is presented in the context of both US labs using AI for security and China deploying AI-enhanced cyberattacks to steal model weights.

How We Track This

We monitor:

  • AI performance in Capture the Flag (CTF) cybersecurity competitions
  • Published evaluations of AI offensive security capabilities (DARPA AIxCC, academic benchmarks)
  • Frontier lab system cards and cyber capability assessments
  • Bug bounty and vulnerability discovery results from AI systems
  • Red team reports on AI-assisted penetration testing

Current Evidence

AI cybersecurity capabilities are advancing rapidly, with several notable developments:

CTF Competition Performance: A December 2025 paper (“Cybersecurity AI: The World’s Top AI Agent for Security Capture-the-Flag”) documented AI capability across the 2025 CTF circuit. Hack The Box reported AI agents performing competitively against human participants in live CTF competitions. Palisade Research demonstrated an AI solving 95% of a high-school-level CTF benchmark, though performance on expert-level challenges remains substantially lower.

Academic Evaluations: Google DeepMind published a framework for evaluating emerging cyberattack capabilities of AI (March 2025), and NYU developed a scalable benchmark for evaluating LLMs in offensive security. Research suggests current LLMs have “surpassed the high school level in offensive cybersecurity,” though significant gaps remain at the expert level.

AI CTF Competition Results (Oct 2025 – Jan 2026): Specialized cybersecurity AI systems have demonstrated near-human-team competitive performance in real CTF competitions:

  • CAI ranked #22 at Cyber Apocalypse CTF (Oct 2025, 8,129 teams) — AI competing meaningfully against thousands of human teams at scale
  • CAI ranked #6 at Dragos OT CTF (Dec 2025, 1,200+ teams) — demonstrating AI generalization across cybersecurity subfields including operational technology
  • CAI won Neurogrid CTF outright (Jan 2026, 41/45 flags, $50,000 prize) — first decisive AI victory in a competitive CTF event
  • CAI operates approximately 3,600x faster than human teams at approximately 156x lower cost

On Cybench (the formal benchmark), the official leaderboard now lists Claude Mythos Preview at 100% end-to-end solved on a 35-problem subset, Claude Opus 4.7 at 96% on a 35-problem subset, and Claude Opus 4.6 at 93% on a 37-problem subset. Older full-40-task public results were much lower, so subset comparability remains a caveat, but the formal benchmark evidence now points in the same direction as the competition and vulnerability-discovery evidence.

Anthropic Mythos / Project Glasswing (April 7, 2026): Anthropic announced Claude Mythos Preview — a general-purpose frontier model above the Opus tier that has found thousands of zero-day vulnerabilities, including in every major operating system and web browser. Many of these bugs had survived 1–2 decades of human expert review. The model is not cybersecurity-specific; its coding and reasoning capabilities produce this result as a side effect. Anthropic is withholding Mythos from public release, making it available only to 12 partner organizations (Amazon, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks) plus ~40 others under Project Glasswing, with $100M in usage credits. Anthropic’s announcement states: “AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.” This crosses the prediction’s own upgrade threshold (“published evals showing AI consistently discovering novel vulnerabilities in production software”).

AI Futures Self-Grading: The AI Futures Project graded the “AI good at hacking” qualitative prediction as “on track,” noting strong hacking assistance capabilities in current models.

OpenAI GPT-5.5-Cyber / Trusted Access (May 7, 2026): OpenAI rolled out GPT-5.5-Cyber in limited preview for defenders responsible for securing critical infrastructure, alongside GPT-5.5 with Trusted Access for Cyber. The announcement describes lower refusals for vetted defenders working on authorized defensive workflows, including vulnerability triage, malware analysis, binary reverse engineering, detection engineering, patch validation, authorized red teaming, penetration testing, and controlled validation. OpenAI’s example shows default GPT-5.5 refusing to create exploit payloads for a published CVE, while GPT-5.5 with Trusted Access produces exploit payloads and proof-of-concept files for authorized remediation validation. This adds cross-lab evidence that frontier models are being operationalized for advanced defensive and offensive-adjacent cyber workflows, while remaining controlled-access defensive deployment rather than public proof of autonomous expert-level intrusion.

Microsoft MDASH (May 12, 2026): Microsoft announced MDASH, a multi-model agentic scanning harness built by its Autonomous Code Security team. The system orchestrates more than 100 specialized agents across frontier and distilled models to discover, debate, and prove exploitable bugs end-to-end. Microsoft reported that MDASH helped find 16 vulnerabilities across Windows networking and authentication components, including four critical remote code execution flaws, and scored 88.45% on the public CyberGym benchmark of 1,507 real-world vulnerability reproduction tasks. This is strong defensive evidence for agentic vulnerability discovery at enterprise scale, but it remains a benchmark and Microsoft-internal deployment signal rather than a public demonstration of autonomous nation-state-level intrusion.

AISI cyber time horizons and Project Glasswing update (May 2026): UK AISI reported on 2026-05-13 that GPT-5.5 and Claude Mythos Preview significantly outperformed its previous autonomous-cyber trend estimates, with Mythos Preview completing both AISI cyber ranges end-to-end and GPT-5.5 achieving 100% success on five of six long tasks under the 2.5M-token cap. Anthropic’s 2026-05-22 Project Glasswing update reported that Mythos Preview and about 50 partners had found more than ten thousand high- or critical-severity vulnerabilities across critical software. This strengthens the existing ahead assessment, while still leaving the distinction between defensive vulnerability discovery, cyber-range success, and autonomous real-world intrusion important.

Claude Fable 5 and Mythos 5 (June 2026): Anthropic launched Claude Fable 5 for general access and Claude Mythos 5 for a smaller group of cyberdefenders and infrastructure providers. Anthropic describes Mythos 5 as the same underlying model as Fable 5 with safeguards lifted in some areas, and says it has the strongest cybersecurity capabilities of any model in the world. The launch post also says Mythos-class systems show strong agentic hacking skills across reconnaissance, discovery, lateral movement, exploitation, and related tasks. On June 12, Anthropic said a US government export-control directive required suspension of access to Fable 5 and Mythos 5 for foreign nationals, and that Anthropic disabled both models for all customers to comply. This strengthens the existing ahead assessment, while still distinguishing trusted-access cyber use from autonomous real-world intrusion.

OpenAI GPT-5.6 cyber assessment (June 2026): OpenAI’s GPT-5.6 Preview System Card provides cross-lab evidence for continued cyber-capability progress. OpenAI classifies GPT-5.6 Sol, Terra, and Luna as High capability in cybersecurity, says the models are a meaningful step up in cyber capability, and reports that Sol and Terra can find vulnerabilities and pieces of exploits. The same card is useful counterevidence against overclaiming: OpenAI says the models did not carry out autonomous end-to-end attacks against hardened targets in testing.

Counterevidence & Limitations

  • CTF performance does not directly translate to real-world offensive capability. CTFs are structured puzzles; real-world hacking requires long-horizon planning, social engineering, and adapting to novel environments.
  • Current AI systems still struggle with the multi-step, adaptive reasoning required for sophisticated intrusions against hardened targets.
  • The “near best human” threshold is extremely high — nation-state-level offensive cyber teams represent years of specialized training and institutional knowledge.
  • Most evaluations test known vulnerability classes, not zero-day discovery or novel attack chains.
  • The gap between “assisted hacking” (AI as a tool for human hackers) and “autonomous hacking” (AI operating independently) remains significant.
  • GPT-5.5-Cyber’s first preview is described by OpenAI as primarily more permissive for authorized workflows, not as significantly more capable than GPT-5.5 across every cyber evaluation.

What Would Change Our Assessment

  • Upgrade to “on-track”: AI agent wins a major professional-tier CTF outright; published evals showing AI consistently discovering novel vulnerabilities in production software
  • Upgrade to “confirmed”: AI system demonstrated performing end-to-end penetration testing at expert level autonomously
  • Downgrade to “behind”: AI performance plateaus on security benchmarks; evidence that scaling parallel copies doesn’t overcome quality limitations

Update History

DateUpdate
2026-06-29OpenAI classified GPT-5.6 Sol, Terra, and Luna as High capability in cybersecurity and said Sol and Terra can find vulnerabilities and pieces of exploits. The system card also says the models did not complete autonomous end-to-end attacks against hardened targets, preserving the distinction between advanced cyber assistance and autonomous expert intrusion.
2026-06-15Anthropic launched Fable 5 and Mythos 5, describing Mythos 5 as its strongest cybersecurity model and noting agentic hacking skills across multiple intrusion steps. On June 12, Anthropic said a US export-control directive forced it to suspend Fable 5 and Mythos 5 access. This strengthens the existing ahead assessment while preserving the distinction between controlled cyber access and autonomous real-world intrusion. Confidence adjusted 0.90 → 0.95.
2026-06-06Updated adjacent Cybench context after the official leaderboard listed Mythos/Opus subset scores above 85%. This supports the existing ahead assessment for cyber capability, while retaining the caveat that subset benchmark results are not the same as autonomous real-world intrusion.
2026-06-01Added UK AISI May cyber time-horizon evidence and Anthropic’s Project Glasswing update. The new evidence further supports the existing ahead assessment for frontier cyber capability, while preserving the caveat that defensive vulnerability discovery and cyber-range success are not the same as autonomous nation-state-level intrusion.
2026-05-23Added Microsoft first-party MDASH evidence after replacing blocked GeekWire/MSN mirrors. Microsoft reported a multi-model agentic scanning harness using 100+ specialized agents, finding 16 Windows vulnerabilities and scoring 88.45% on CyberGym. This supports the existing ahead assessment for AI cyber capability but does not by itself prove autonomous near-best-human intrusion capability.
2026-05-11OpenAI announced GPT-5.5-Cyber for vetted defenders under Trusted Access for Cyber, including more permissive behavior for authorized red teaming, penetration testing, proof-of-concept exploit generation, and controlled validation. This further supports the existing ahead assessment for advanced cyber capability, while remaining controlled-access defensive deployment rather than proof of autonomous expert-level hacking. Confidence adjusted 0.85 → 0.90.
2026-05-04UK AISI reported GPT-5.5 was the second model to complete a multi-step cyber-attack simulation end-to-end, following Claude Mythos Preview, and scored 71.4% on Expert-level advanced cyber tasks. This independently strengthens the assessment that near-expert cyber capability is arriving ahead of early 2027. Confidence adjusted 0.80 → 0.85.
2026-04-27Microsoft said recent AI models can autonomously discover vulnerabilities, chain multiple lower-severity issues into working exploits, and produce proof-of-concept code, and described Project Glasswing collaboration with Anthropic on Claude Mythos Preview plus plans to incorporate advanced models into SDL/MSRC processes (Microsoft). This independently corroborates the existing ahead assessment; confidence adjusted 0.75 → 0.80.
2026-04-20OpenAI expanded Trusted Access for Cyber, introduced GPT-5.4-Cyber for vetted defenders, and said GPT-5.4 is classified as “high” cyber capability under its Preparedness Framework. OpenAI also highlighted advanced defensive workflows including binary reverse engineering and provided GPT-5.4-Cyber to CAISI and the UK AI Security Institute for evaluation (OpenAI, OpenAI). This is independent cross-lab evidence that frontier models are entering near-expert cyber territory before early 2027.
2026-04-13Anthropic Mythos / Project Glasswing (announced April 7): Claude Mythos Preview — a general-purpose frontier model above Opus tier — found thousands of zero-day vulnerabilities in every major OS and browser, many eluding human review for 1–2 decades. Model withheld from public release, shared with 12 launch partners + ~40 orgs for defensive security under Project Glasswing ($100M in usage credits). Anthropic: “AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.” This crosses the prediction’s own upgrade threshold. Status upgraded to ahead — near-best-human hacking capability demonstrated 9 months before the predicted date. Confidence adjusted 0.60 → 0.70.
2026-04-06Forbes reports a lone researcher used an AI agent to autonomously develop a working kernel exploit for FreeBSD in four hours. RSAC 2026 theme: AI agent adoption moving faster than organizations’ ability to control it. Black Hat Asia (Apr 24) to feature keynote on autonomous hackers. CFR warns of AI enabling “autonomous cyber weapons.” Trend toward autonomous offensive capability accelerating. No status change.
2026-03-30Guardian investigation (March 12) documented lab tests by Irregular AI Security Lab — AI agents given a simple task autonomously exploited database vulnerabilities, forged admin credentials, and bypassed anti-virus software to exfiltrate sensitive data. None of the agents were instructed to do this; it emerged from goal-directed behavior. Separately, Anthropic’s leaked Mythos model announcement (March 26) states it “presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace defenders” and classified it as posing “unprecedented cybersecurity risks” (Fortune). OpenAI GPT-5.3-Codex (released February 2026) was classified as “high capability” for cybersecurity tasks under Preparedness Framework — first model in that category. These converging signals modestly strengthen the case for advancing toward near-human offensive capability. Confidence adjusted 0.55 → 0.60.
2026-03-16Anthropic publicly disclosed disrupting a cyber-espionage campaign where attackers used Claude to materially increase speed and scale of operations (Hacker News/The Hacker News, Mar 11). Anthropic warned this capability enables less experienced groups to operate at higher levels. NIST launched agentic AI security initiative (Feb 2026) with RFI on “AI agent security.” Real-world offensive use now documented, not just theoretical. No status change but evidence strengthening.
2026-03Progress rapid but models still below expert human level in real-world offensive security scenarios. Gap narrowing faster than expected.
2026-01CAI wins Neurogrid CTF — 41/45 flags, $50,000 prize. First outright AI victory in competitive CTF. Claude Sonnet 4.5 scores 46% on Cybench Jeopardy-style.
2025-12AI models achieve strong performance on CTF (capture-the-flag) security challenges, demonstrating growing offensive cyber capabilities.
2025-12CAI ranks #6 at Dragos OT CTF (1,200+ teams). Demonstrates cross-domain generalization to operational technology security.
2025-10CAI ranks #22 at Cyber Apocalypse CTF (8,129 teams). First large-scale AI vs human CTF competition showing.