Superhuman coder emerges
A superhuman coder (SC): an AI system that can do any coding tasks that the best AGI company engineer does.
What AI 2027 Predicted
The scenario predicts the emergence of a “superhuman coder” — an AI system that surpasses the best human programmers on essentially any coding task, both in quality and speed. This is a key milestone in the path toward broader superintelligence.
How We Track This
We monitor:
- SWE-bench Verified and SWE-bench Pro scores
- Terminal-Bench results
- Real-world coding competitions (Codeforces, etc.) — AI vs human rankings
- Enterprise reports on code quality from AI vs human developers
- Novel system-level projects completed entirely by AI
Current Evidence
Coding AI is advancing rapidly but “superhuman” remains far. Claude Opus 4.6 (Thinking) leads SWE-bench Verified at 79.2%, GPT-5.4 at 77.2% (vals.ai). But on the harder SWE-bench Pro (real-world complexity), best scores are only 23.3% (GPT-5) and 23.1% (Claude Opus 4.1), per Scale Labs. Terminal-Bench 2.0: GPT 5.3 Codex at 65%, Opus 4.6 at 63%. 16 Claude Opus 4.6 agents wrote a C compiler from scratch. Claude Code went from zero to #1 coding tool in 8 months. But the gap between “very useful coding assistant” and “superhuman coder” (any task, faster and cheaper than best human) remains large.
Sources:
- SWE-bench Verified Results — Vals.ai
- SWE-Bench Pro Leaderboard — Scale Labs
- Inside OpenAI’s Race to Catch Up to Claude Code — WIRED
- Grading AI 2027’s 2025 Predictions — AI Futures Project
Counterevidence & Limitations
- SWE-bench Pro results (~23%) show that real-world coding is far harder than benchmarks suggest
- “Superhuman” is a high bar — surpassing the best humans on any task is qualitatively different from being a useful assistant
- Current tools require significant human guidance for complex projects
- The March 2027 predicted date may be too aggressive by 6–18 months
What Would Change Our Assessment
- Upgrade to “emerging”: SWE-bench Pro scores above 50%; AI consistently winning coding competitions
- Upgrade to “on-track”: SWE-bench Pro above 70%; credible reports of AI completing complex projects without human guidance
- Maintain at “not-yet-testable”: Prediction date hasn’t arrived yet
Update History
| Date | Update |
|---|---|
| 2025-09 | Gemini 2.5 Deep Think achieves gold-medal performance at 2025 ICPC World Finals (Sep 17), solving 10/12 problems including one no human team solved. Strongest “superhuman” coding signal to date, though competitive algorithmic programming differs from real-world software engineering. |
| 2025-11 | Claude Opus 4.5 reportedly outperforms every human candidate on Anthropic internal engineering assessments. Gemini 3 scores 37.4% on Humanity’s Last Exam (world record). Gap narrowing visibly but “superhuman coder” milestone remains contested. |
| 2025-12 | AI Futures Project places median for “Superhuman Coder” at December 2031 — vs AI 2027 scenario’s January 2027. |
| 2026-01 | Kokotajlo personal median for full coding automation: December 2030. |
| 2026-03 | Prediction timeframe not yet reached (March 2027). AI coding capabilities advancing rapidly — SWE-bench scores improving, autonomous coding agents shipping — but superhuman performance across all coding tasks remains distant. |