AI 2027 vs Reality
The Big Picture
The AI 2027 scenario predicted a specific, aggressive path from today’s AI to superintelligence by 2027. Eleven months after publication, where do things actually stand?
The short answer: Reality is tracking at roughly 70% of the scenario’s predicted pace. The direction is right. The speed is somewhat slower. And the implications depend on whether that gap stays constant or narrows.
The Speed Ratio
The AI Futures Project — the team behind AI 2027 — graded their own 2025 predictions in February 2026. Their assessment:
- Aggregate pace: 58-66% of predicted speed on quantitative metrics
- Qualitative predictions: mostly on pace
- Individual prediction aggregate: mean 75%, median 84%
Our independent tracking of 48 predictions aligns with a roughly 0.70× speed ratio — meaning things are happening, but about 30% slower than the scenario depicted.
That’s not a failure. A scenario that’s directionally correct and 70% on pace is unusually strong for a detailed multi-year forecast. But it does shift the timeline.
What’s Tracking Ahead
A few areas are moving faster than AI 2027 predicted:
- Agent capability improvement — METR time horizons are doubling every 3-4 months, compared to the predicted 7-month doubling. This is the single most important metric for the scenario’s core thesis (AI accelerating AI research), and it’s ahead of schedule.
- Labor market impact — Concern about AI job displacement emerged earlier and more visibly than predicted.
- Lab competition — The gap between top US labs (0-2 months) is even smaller than the predicted 3-9 months. The race is tighter.
What’s Roughly On Track
The bulk of qualitative predictions are tracking as described:
- Infrastructure investment scale and pace
- Agent emergence and the “useful but unreliable” dynamic
- Coding agent adoption and value creation
- AI-for-AI research focus at major labs
- Continuous training paradigm shift
- Export control impact on Chinese AI
- DOD engagement with AI labs
- Public skepticism persisting despite rapid progress
What’s Behind
Several quantitative predictions are lagging:
- Compute scaling — No confirmed training run substantially larger than GPT-4.5. The 10²⁸ FLOP milestone is behind schedule.
- Benchmark targets — SWE-bench-Verified at 74.5% vs predicted 85%. Meaningful progress, but behind the curve.
- Financial milestones — Valuations and market performance behind the aggressive predictions.
- China gap — Chinese labs appear further behind than predicted, partly due to export controls being more effective than expected.
Category-by-Category Breakdown
Model Capability (6 predictions)
Mixed. Qualitative trends (continuous training, cheaper models) confirmed. Quantitative milestones (FLOP targets, benchmark scores) behind. The scenario may have overweighted compute scaling relative to algorithmic and architectural progress.
Agent Autonomy (6 predictions)
Mostly confirmed. Agent emergence, pricing, long-horizon task struggles, personal assistant marketing — all accurate. METR time horizons actually ahead of prediction. This category is the strongest validation of the scenario.
Coding (5 predictions)
Strong. Coding agents providing real value, Claude Code revenue significant. SWE-bench slightly behind target but still impressive absolute progress. The coding transformation narrative is playing out almost exactly as described.
Governance (5 predictions)
Confirmed. DOD contracting, academic/media skepticism, early capability secrecy trends — all on track. Later predictions about nationalization debates and anti-AI protests are still emerging.
Security (5 predictions)
Mixed, mostly emerging. Bioweapon assistance capabilities on track (Anthropic ASL-3 upgrade). Model theft, cyberwarfare, and security infrastructure predictions are not yet fully testable but showing early signals.
Geopolitics (7 predictions)
Partially confirmed. Export control impact confirmed. China compute constraints confirmed. The scenario’s prediction that China would centralize into a single mega-datacenter is harder to verify but directionally plausible. Some predictions about China’s model gap may be behind.
Economic Impact (10 predictions)
Strongest category. Infrastructure investment, capex trajectory, datacenter buildouts, labor market disruption — most confirmed or on track. Financial valuations are the main area behind.
Takeoff (4 predictions)
Not yet testable. These predict AI automating AI research, with multipliers reaching 1.5× to 4×. Early signals exist (AI-assisted coding, research acceleration), but the dramatic takeoff dynamics target late 2026 through 2027.
The Adjusted Timeline
If progress continues at 70% of the depicted rate, what does that mean for the scenario’s dramatic predictions?
The AI Futures team estimated:
- Without additional slowdowns: Takeoff shifts from late 2027 → mid-2029
- With compute/labor growth constraints: Takeoff shifts to mid-2028 to mid-2030
- Daniel Kokotajlo’s (lead author) updated median for full coding automation: 2029
- Eli Lifland’s updated median for full coding automation: early 2030s
In other words: even the authors themselves now expect the critical milestones 1-3 years later than their original scenario depicted. But they still expect them.
What This Means
Three ways to read the evidence:
The “Vindication” Reading
AI 2027 is the most accurate detailed AI forecast ever published. Its qualitative picture of 2025 was almost perfectly right. The 30% speed lag is within the range of uncertainty for any forecast, and some metrics (agent capabilities) are actually ahead. The takeoff is coming — just a bit later.
The “Overconfident” Reading
The quantitative predictions are behind because the scenario overstated how fast compute would scale. The qualitative predictions being right is less impressive because many of those trends were already visible in early 2025. The takeoff thesis remains unproven, and 2029-2030 is far enough away that a lot could change.
The “Both True” Reading
The scenario correctly identified the dynamics and direction of AI progress. It was too aggressive on timelines but too conservative on some capability metrics. The honest assessment is: this is the best public forecast we have, it’s roughly on track, and the remaining uncertainty is genuinely large.
We think the third reading is closest to the truth. The scenario deserves to be taken seriously and tracked rigorously — which is exactly what we’re doing.
Read more: