AI 2027 Predictions Explained
How Do You Turn a Story Into Predictions?
AI 2027 isn’t a list of predictions — it’s a narrative scenario. It tells a story, month by month, about how AI development unfolds. To track it, we need to extract discrete, testable claims from that narrative.
This is harder than it sounds, and we want to be transparent about how we do it.
Extraction Process
We read the full scenario and identify specific factual claims — things that are either true or false, happening or not happening. Each claim needs to be:
- Specific enough to evaluate — “AI gets better” doesn’t count. “SWE-bench-Verified reaches 85% by mid-2025” does.
- Attributable to a timeframe — The scenario is chronological, so most claims have implicit or explicit timing.
- Distinct from other claims — We avoid double-counting by separating overlapping predictions into clear, independent units.
Some claims are quantitative (benchmark scores, revenue figures, compute targets) and relatively easy to evaluate. Others are qualitative (institutional dynamics, public sentiment, strategic decisions) and require more judgment.
We currently track 48 predictions across 8 categories. This isn’t exhaustive — the scenario contains hundreds of implicit claims — but it covers the most important and trackable ones.
What We Don’t Track
Some elements of the scenario are inherently untestable or not yet meaningful to track:
- Hypothetical decisions (e.g., “OpenBrain decides to continue racing”) — these are scenario branches, not predictions
- Extremely vague claims — narrative color that doesn’t make a testable assertion
- Claims about internal AI lab dynamics — often impossible to verify from outside
Our Status Taxonomy
Every prediction gets one of six statuses. Here’s what each means and how we assign it:
Confirmed
What it means: The predicted event or trend has clearly materialized, within roughly the predicted timeframe.
Example: “Unreliable but useful AI agents emerge” — ChatGPT agent launched July 2025, matching the scenario’s mid-2025 prediction almost exactly.
What would change it: Nothing, short of evidence that we misidentified the event. Confirmed predictions stay confirmed.
Ahead
What it means: The prediction is happening faster than the scenario expected.
Example: “METR time horizon doubles every 4 months” — Actual doubling rate is approximately every 3 months since late 2024, faster than predicted.
What would change it: A plateau or slowdown that brings the pace back in line with or behind the prediction.
→ On Track
What it means: The prediction is progressing roughly as expected. Not yet fully confirmed, but trajectory aligns.
Example: “Global AI capex reaches $1 trillion cumulative” — Investment trajectory is consistent with hitting this milestone.
What would change it: Confirmation (upgrade to Confirmed) or evidence of a slowdown (downgrade to Behind).
Behind
What it means: The predicted event is happening more slowly than the scenario expected, or hasn’t happened by the predicted timeframe.
Example: “SWE-bench-Verified score reaches 85%” — Predicted by mid-2025, actual best score was 74.5%. Meaningful progress, but behind.
What would change it: A rapid catch-up that puts it back on the predicted trajectory, or further delay that makes “Behind” increasingly clear.
Emerging
What it means: We see early signals that point toward this prediction, but it’s too soon to score definitively. The evidence is suggestive, not conclusive.
Example: “Anti-AI protests gain significant political influence” — Some organized opposition exists, but nothing at the scale the scenario describes.
What would change it: Stronger evidence (upgrade to On Track or Confirmed) or evidence the trend is stalling (downgrade to Behind).
◌ Not Yet Testable
What it means: The prediction’s timeframe hasn’t arrived yet. We can’t meaningfully evaluate it.
Example: “AI R&D progress multiplier reaches 4×” — This targets late 2027, which hasn’t happened yet.
What would change it: Time passing and evidence arriving. These will eventually move to other statuses.
How to Read a Prediction Page
Each of our 48 prediction pages follows a consistent structure. Here’s what you’ll find:
Header
- Title — a clear label for the prediction
- Status badge — current assessment with color coding
- Category — which domain (Model Capability, Coding, Geopolitics, etc.)
- Confidence score — how certain we are about our status assessment
- Predicted date — when the scenario expects this to happen
- Last updated — when we last reviewed this prediction
Body Sections
-
What AI 2027 Predicted — The original claim from the scenario, in context. We quote or closely paraphrase the source material.
-
How We Track This — Our operationalization: what real-world indicators do we monitor? What benchmarks, data sources, or events count as evidence?
-
Current Evidence — Sourced evidence supporting our current status assessment. Every claim here should have a link or citation.
-
Counterevidence & Limitations — What argues against our assessment? What are we uncertain about? This section exists to keep us honest.
-
What Would Change Our Assessment — Explicit criteria for upgrading or downgrading the status. This makes our reasoning auditable.
-
Update History — A changelog of status changes with dates and reasoning.
What Confidence Scores Mean
Each prediction has a confidence score from 0 to 1 that represents how certain we are about our status assessment — not how likely the prediction is to ultimately come true.
- 0.90-1.00 — Very high confidence in our assessment. Clear evidence, minimal ambiguity.
- 0.70-0.89 — High confidence. Strong evidence, but some room for interpretation.
- 0.50-0.69 — Moderate confidence. Evidence points one way but isn’t conclusive.
- 0.30-0.49 — Low confidence. Genuine uncertainty about how to read the evidence.
- Below 0.30 — Very uncertain. We’re making a judgment call with limited information.
Important distinction: A “confirmed” prediction with 0.85 confidence means we’re quite sure it’s confirmed, but there’s some ambiguity. It does not mean there’s an 85% chance the prediction is true — we already believe it’s true, we’re just not perfectly certain about our interpretation.
Why This Matters
Tracking predictions from a narrative scenario is inherently messier than scoring a prediction market or a list of concrete bets. We’re making interpretive choices at every step:
- Which claims count as “predictions”?
- How do we operationalize qualitative claims?
- What counts as sufficient evidence?
- When does “on track” become “behind”?
We try to be transparent about these choices. Our methodology page goes deeper into the principles we follow. And every prediction page shows its evidence and reasoning, so you can disagree with our assessments if you see the evidence differently.
The goal isn’t to “grade” AI 2027 as right or wrong. It’s to build a structured, honest, ongoing record of how the most detailed AI forecast compares with reality — useful whether the scenario turns out to be prescient or overblown.
Read more: