METR time horizon doubles every 4 months
METR time horizons doubled every 7 months from 2019-2024 and every 4 months from 2024-onward (Appendix G, page 51). The acceleration from 7-month to 4-month doubling is a key claim.
What AI 2027 Predicted
The scenario cites METR’s time horizon benchmark as a key metric for measuring agent capability. It predicts the time horizon (how long an AI agent can work autonomously on a task) doubles approximately every 4 months. This exponential improvement is central to the scenario’s timeline for reaching human-level autonomy.
How We Track This
We monitor:
- Official METR time horizon publications and updates
- Community estimates for new models (EA Forum, etc.)
- The ratio between predicted and actual doubling rates
- AI Futures Project’s own grading against this metric
Current Evidence
METR TH1.1 update (Jan 2026) provides refined data with 228 tasks (up from 170). Key findings: all-time doubling is 188 days (6.3 months), but from 2023 onward it’s 129 days (4.3 months), and from 2024 onward just 89 days (3 months). Note: The previous “~4.7 month” estimate was based on an earlier calculation method; the 4.3-month figure from METR TH1.1’s 2023+ window is the current best estimate. Claude Opus 4.6 achieved a 50% time horizon of 719 minutes (~12 hours) and an 80% horizon of 70 minutes. EA Forum analysis estimates Opus 4.6 could reach 8–12 hours on official METR measurement. The 4-month doubling rate is consistent with the 2023+ data. AI Futures’ own grading places the pace at 1.04× their corrected trajectory, though the rate varies significantly depending on the time window selected.
Sources:
- METR Time Horizon 1.1 Update (Jan 2026)
- METR Time Horizons Dashboard & Raw Data
- Estimating METR Time Horizons for Opus 4.6 & GPT 5.3 — EA Forum
- Grading AI 2027’s 2025 Predictions — AI Futures Project
Counterevidence & Limitations
- Different time windows give different doubling rates (all-time: 6.3 months; 2024+: 3 months)
- METR’s task set is synthetic and may not reflect real-world agent performance
- The 80% and 50% horizons tell different stories — which one to track matters
- Progress may plateau as agents encounter harder tasks requiring different capabilities
What Would Change Our Assessment
- Maintain at “ahead”: Doubling rate continues at or below 4 months
- Downgrade to “on-track”: Doubling rate settles above 4 months consistently
- Downgrade to “behind”: Clear evidence of plateau in time horizon growth
Update History
| Date | Update |
|---|---|
| 2025-07 | METR publishes domain time-horizon analysis (July 14): approximately 7-month doubling time across software engineering, ML, and cybersecurity domains. Frontier models (Claude 3.7) at ~50-minute 50% horizon in early 2025. The 7-month historical average is slower than AI 2027’s predicted 4-month acceleration, but the extrapolation still arrives at approximately early 2027 — consistent with the essay’s timeline. |
| 2025-08 | GPT-5 launches with METR pre-deployment evaluation disclosing a 50% time horizon of approximately 2 hours 17 minutes on software engineering tasks (August 7). METR’s August 12 research update confirms the ~7-month doubling trend across domains for the 2019-2024 historical period, while noting recent acceleration toward 4-month doubling. |
| 2025-11 | GPT-5.1-Codex-Max METR evaluation (November 19) shows 50% time horizon of approximately 2 hours 42 minutes — up from 2h17m in August. Share of success on hardest AI R&D-relevant tasks jumps from 2% to 8% (4x improvement). Doubling trend continues. |
| 2025-12 | AI Futures Project Dec 2025 model update assessed METR coding time horizon as tracking at approximately 1.04× their central AI-2027-speed trajectory. The historical 7-month doubling rate confirmed across domains showed signs of acceleration in the most recent period. |
| 2026-01 | METR released Time Horizon 1.1: expanded from 170 to 228 tasks, long tasks (8h+) doubled from 14 to 31. Key finding: in 2024–2025, coding time horizons doubled every approximately 4 months — matching AI 2027’s predicted acceleration. Status upgraded to on-track. |
| 2026-03 | Actual doubling pace (~3 months) is faster than the predicted 4 months. AI agent capabilities advancing ahead of the scenario’s timeline. |
| 2026-03-23 | Updated METR data confirms Claude Opus 4.6 at 14.5-hour 50% time horizon, up from ~12 hours after bug fix (Wikipedia, OfficeChai). METR also published study (Mar 10) finding roughly half of SWE-bench-passing PRs would not be merged by maintainers, suggesting benchmark scores overstate real-world agent capability (METR). Doubling pace remains ahead of prediction. No status change. |
| 2026-03-30 | Ajeya Cotra (METR researcher, former Open Philanthropy) published analysis (March 3) noting that Opus 4.6 at ~12 hours was already past her January forecast for year-end 2026, and projecting 100+ hour time horizon by year-end at current pace (planned-obsolescence.org). She notes the benchmark suite is “nearly saturated” for short tasks, creating noise in longer-horizon estimates. Wide CI (5.3 to 66 hours), but 19 tasks estimated at 8+ hours — Opus 4.6 solved 14 of them at least once. GPT-5.4 released March 5 with Thinking and Pro variants; METR measurement pending. The 4-month doubling rate appears to be accelerating further, not stabilizing. |