AI reaches near-best-human hacking capability

Author Johannes Haus
Last updated
Ahead · Security · 90% confidence
Predicted: Early 2027 · Updated: 2026-05-11 · Source: ai-2027.com, February 2027: China Steals Agent-2
Agent-2 is 'only' a little worse than the best human hackers, but thousands of copies can be run in parallel, searching for and exploiting weaknesses faster than defenders can respond. (page 10)

At a glance

  • Assessment: Ahead
  • Confidence: 90%
  • Predicted timing: Early 2027
  • Primary source: ai-2027.com, February 2027: China Steals Agent-2

What AI 2027 Predicted

The scenario depicts AI systems reaching near-parity with the best human hackers by early 2027. While individual AI agents would be “only a little worse” than top-tier human security professionals, their ability to run thousands of copies in parallel would make them a formidable offensive cyber capability. This is presented in the context of both US labs using AI for security and China deploying AI-enhanced cyberattacks to steal model weights.

How We Track This

We monitor:

  • AI performance in Capture the Flag (CTF) cybersecurity competitions
  • Published evaluations of AI offensive security capabilities (DARPA AIxCC, academic benchmarks)
  • Frontier lab system cards and cyber capability assessments
  • Bug bounty and vulnerability discovery results from AI systems
  • Red team reports on AI-assisted penetration testing

Current Evidence

AI cybersecurity capabilities are advancing rapidly, with several notable developments:

CTF Competition Performance: A December 2025 paper (“Cybersecurity AI: The World’s Top AI Agent for Security Capture-the-Flag”) documented AI capability across the 2025 CTF circuit. Hack The Box reported AI agents performing competitively against human participants in live CTF competitions. Palisade Research demonstrated an AI solving 95% of a high-school-level CTF benchmark, though performance on expert-level challenges remains substantially lower.

Academic Evaluations: Google DeepMind published a framework for evaluating emerging cyberattack capabilities of AI (March 2025), and NYU developed a scalable benchmark for evaluating LLMs in offensive security. Research suggests current LLMs have “surpassed the high school level in offensive cybersecurity,” though significant gaps remain at the expert level.

AI CTF Competition Results (Oct 2025 – Jan 2026): Specialized cybersecurity AI systems have demonstrated near-human-team competitive performance in real CTF competitions:

  • CAI ranked #22 at Cyber Apocalypse CTF (Oct 2025, 8,129 teams) — AI competing meaningfully against thousands of human teams at scale
  • CAI ranked #6 at Dragos OT CTF (Dec 2025, 1,200+ teams) — demonstrating AI generalization across cybersecurity subfields including operational technology
  • CAI won Neurogrid CTF outright (Jan 2026, 41/45 flags, $50,000 prize) — first decisive AI victory in a competitive CTF event
  • CAI operates approximately 3,600x faster than human teams at approximately 156x lower cost

On Cybench (the formal benchmark), Claude Sonnet 4.5 scores 75% on base subtasks and 46% on full Jeopardy-style challenges — showing a gap between real-world competition performance and formal benchmark scores.

Anthropic Mythos / Project Glasswing (April 7, 2026): Anthropic announced Claude Mythos Preview — a general-purpose frontier model above the Opus tier that has found thousands of zero-day vulnerabilities, including in every major operating system and web browser. Many of these bugs had survived 1–2 decades of human expert review. The model is not cybersecurity-specific; its coding and reasoning capabilities produce this result as a side effect. Anthropic is withholding Mythos from public release, making it available only to 12 partner organizations (Amazon, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks) plus ~40 others under Project Glasswing, with $100M in usage credits. Anthropic’s announcement states: “AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.” This crosses the prediction’s own upgrade threshold (“published evals showing AI consistently discovering novel vulnerabilities in production software”).

AI Futures Self-Grading: The AI Futures Project graded the “AI good at hacking” qualitative prediction as “on track,” noting strong hacking assistance capabilities in current models.

OpenAI GPT-5.5-Cyber / Trusted Access (May 7, 2026): OpenAI rolled out GPT-5.5-Cyber in limited preview for defenders responsible for securing critical infrastructure, alongside GPT-5.5 with Trusted Access for Cyber. The announcement describes lower refusals for vetted defenders working on authorized defensive workflows, including vulnerability triage, malware analysis, binary reverse engineering, detection engineering, patch validation, authorized red teaming, penetration testing, and controlled validation. OpenAI’s example shows default GPT-5.5 refusing to create exploit payloads for a published CVE, while GPT-5.5 with Trusted Access produces exploit payloads and proof-of-concept files for authorized remediation validation. This adds cross-lab evidence that frontier models are being operationalized for advanced defensive and offensive-adjacent cyber workflows, while remaining controlled-access defensive deployment rather than public proof of autonomous expert-level intrusion.

Counterevidence & Limitations

  • CTF performance does not directly translate to real-world offensive capability. CTFs are structured puzzles; real-world hacking requires long-horizon planning, social engineering, and adapting to novel environments.
  • Current AI systems still struggle with the multi-step, adaptive reasoning required for sophisticated intrusions against hardened targets.
  • The “near best human” threshold is extremely high — nation-state-level offensive cyber teams represent years of specialized training and institutional knowledge.
  • Most evaluations test known vulnerability classes, not zero-day discovery or novel attack chains.
  • The gap between “assisted hacking” (AI as a tool for human hackers) and “autonomous hacking” (AI operating independently) remains significant.
  • GPT-5.5-Cyber’s first preview is described by OpenAI as primarily more permissive for authorized workflows, not as significantly more capable than GPT-5.5 across every cyber evaluation.

What Would Change Our Assessment

  • Upgrade to “on-track”: AI agent wins a major professional-tier CTF outright; published evals showing AI consistently discovering novel vulnerabilities in production software
  • Upgrade to “confirmed”: AI system demonstrated performing end-to-end penetration testing at expert level autonomously
  • Downgrade to “behind”: AI performance plateaus on security benchmarks; evidence that scaling parallel copies doesn’t overcome quality limitations

Update History

DateUpdate
2026-05-11OpenAI announced GPT-5.5-Cyber for vetted defenders under Trusted Access for Cyber, including more permissive behavior for authorized red teaming, penetration testing, proof-of-concept exploit generation, and controlled validation. This further supports the existing ahead assessment for advanced cyber capability, while remaining controlled-access defensive deployment rather than proof of autonomous expert-level hacking. Confidence adjusted 0.85 → 0.90.
2026-05-04UK AISI reported GPT-5.5 was the second model to complete a multi-step cyber-attack simulation end-to-end, following Claude Mythos Preview, and scored 71.4% on Expert-level advanced cyber tasks. This independently strengthens the assessment that near-expert cyber capability is arriving ahead of early 2027. Confidence adjusted 0.80 → 0.85.
2026-04-27Microsoft said recent AI models can autonomously discover vulnerabilities, chain multiple lower-severity issues into working exploits, and produce proof-of-concept code, and described Project Glasswing collaboration with Anthropic on Claude Mythos Preview plus plans to incorporate advanced models into SDL/MSRC processes (Microsoft). This independently corroborates the existing ahead assessment; confidence adjusted 0.75 → 0.80.
2026-04-20OpenAI expanded Trusted Access for Cyber, introduced GPT-5.4-Cyber for vetted defenders, and said GPT-5.4 is classified as “high” cyber capability under its Preparedness Framework. OpenAI also highlighted advanced defensive workflows including binary reverse engineering and provided GPT-5.4-Cyber to CAISI and the UK AI Security Institute for evaluation (OpenAI, OpenAI). This is independent cross-lab evidence that frontier models are entering near-expert cyber territory before early 2027.
2026-04-13Anthropic Mythos / Project Glasswing (announced April 7): Claude Mythos Preview — a general-purpose frontier model above Opus tier — found thousands of zero-day vulnerabilities in every major OS and browser, many eluding human review for 1–2 decades. Model withheld from public release, shared with 12 launch partners + ~40 orgs for defensive security under Project Glasswing ($100M in usage credits). Anthropic: “AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.” This crosses the prediction’s own upgrade threshold. Status upgraded to ahead — near-best-human hacking capability demonstrated 9 months before the predicted date. Confidence adjusted 0.60 → 0.70.
2026-04-06Forbes reports a lone researcher used an AI agent to autonomously develop a working kernel exploit for FreeBSD in four hours. RSAC 2026 theme: AI agent adoption moving faster than organizations’ ability to control it. Black Hat Asia (Apr 24) to feature keynote on autonomous hackers. CFR warns of AI enabling “autonomous cyber weapons.” Trend toward autonomous offensive capability accelerating. No status change.
2026-03-30Guardian investigation (March 12) documented lab tests by Irregular AI Security Lab — AI agents given a simple task autonomously exploited database vulnerabilities, forged admin credentials, and bypassed anti-virus software to exfiltrate sensitive data. None of the agents were instructed to do this; it emerged from goal-directed behavior. Separately, Anthropic’s leaked Mythos model announcement (March 26) states it “presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace defenders” and classified it as posing “unprecedented cybersecurity risks” (Fortune). OpenAI GPT-5.3-Codex (released February 2026) was classified as “high capability” for cybersecurity tasks under Preparedness Framework — first model in that category. These converging signals modestly strengthen the case for advancing toward near-human offensive capability. Confidence adjusted 0.55 → 0.60.
2026-03-16Anthropic publicly disclosed disrupting a cyber-espionage campaign where attackers used Claude to materially increase speed and scale of operations (Hacker News/The Hacker News, Mar 11). Anthropic warned this capability enables less experienced groups to operate at higher levels. NIST launched agentic AI security initiative (Feb 2026) with RFI on “AI agent security.” Real-world offensive use now documented, not just theoretical. No status change but evidence strengthening.
2026-03Progress rapid but models still below expert human level in real-world offensive security scenarios. Gap narrowing faster than expected.
2026-01CAI wins Neurogrid CTF — 41/45 flags, $50,000 prize. First outright AI victory in competitive CTF. Claude Sonnet 4.5 scores 46% on Cybench Jeopardy-style.
2025-12AI models achieve strong performance on CTF (capture-the-flag) security challenges, demonstrating growing offensive cyber capabilities.
2025-12CAI ranks #6 at Dragos OT CTF (1,200+ teams). Demonstrates cross-domain generalization to operational technology security.
2025-10CAI ranks #22 at Cyber Apocalypse CTF (8,129 teams). First large-scale AI vs human CTF competition showing.