METR time horizon doubles every 4 months

Ahead · Agent Autonomy · 80% confidence
Predicted: ~4 month doubling from 2024+ ·Adjusted: ~4.3 month doubling (METR 1.1, 2023+ data) · Updated: 2026-04-02 · Source: ai-2027.com, Appendix G (page 51)
METR time horizons doubled every 7 months from 2019-2024 and every 4 months from 2024-onward (Appendix G, page 51). The acceleration from 7-month to 4-month doubling is a key claim.

What AI 2027 Predicted

The scenario cites METR’s time horizon benchmark as a key metric for measuring agent capability. It predicts the time horizon (how long an AI agent can work autonomously on a task) doubles approximately every 4 months. This exponential improvement is central to the scenario’s timeline for reaching human-level autonomy.

How We Track This

We monitor:

  • Official METR time horizon publications and updates
  • Community estimates for new models (EA Forum, etc.)
  • The ratio between predicted and actual doubling rates
  • AI Futures Project’s own grading against this metric

Current Evidence

METR TH1.1 update (Jan 2026) provides refined data with 228 tasks (up from 170). Key findings: all-time doubling is 188 days (6.3 months), but from 2023 onward it’s 129 days (4.3 months), and from 2024 onward just 89 days (3 months). Note: The previous “~4.7 month” estimate was based on an earlier calculation method; the 4.3-month figure from METR TH1.1’s 2023+ window is the current best estimate. Claude Opus 4.6 achieved a 50% time horizon of 719 minutes (~12 hours) and an 80% horizon of 70 minutes. EA Forum analysis estimates Opus 4.6 could reach 8–12 hours on official METR measurement. The 4-month doubling rate is consistent with the 2023+ data. AI Futures’ own grading places the pace at 1.04× their corrected trajectory, though the rate varies significantly depending on the time window selected.

Sources:

Counterevidence & Limitations

  • Different time windows give different doubling rates (all-time: 6.3 months; 2024+: 3 months)
  • METR’s task set is synthetic and may not reflect real-world agent performance
  • The 80% and 50% horizons tell different stories — which one to track matters
  • Progress may plateau as agents encounter harder tasks requiring different capabilities

What Would Change Our Assessment

  • Maintain at “ahead”: Doubling rate continues at or below 4 months
  • Downgrade to “on-track”: Doubling rate settles above 4 months consistently
  • Downgrade to “behind”: Clear evidence of plateau in time horizon growth

Update History

DateUpdate
2025-07METR publishes domain time-horizon analysis (July 14): approximately 7-month doubling time across software engineering, ML, and cybersecurity domains. Frontier models (Claude 3.7) at ~50-minute 50% horizon in early 2025. The 7-month historical average is slower than AI 2027’s predicted 4-month acceleration, but the extrapolation still arrives at approximately early 2027 — consistent with the essay’s timeline.
2025-08GPT-5 launches with METR pre-deployment evaluation disclosing a 50% time horizon of approximately 2 hours 17 minutes on software engineering tasks (August 7). METR’s August 12 research update confirms the ~7-month doubling trend across domains for the 2019-2024 historical period, while noting recent acceleration toward 4-month doubling.
2025-11GPT-5.1-Codex-Max METR evaluation (November 19) shows 50% time horizon of approximately 2 hours 42 minutes — up from 2h17m in August. Share of success on hardest AI R&D-relevant tasks jumps from 2% to 8% (4x improvement). Doubling trend continues.
2025-12AI Futures Project Dec 2025 model update assessed METR coding time horizon as tracking at approximately 1.04× their central AI-2027-speed trajectory. The historical 7-month doubling rate confirmed across domains showed signs of acceleration in the most recent period.
2026-01METR released Time Horizon 1.1: expanded from 170 to 228 tasks, long tasks (8h+) doubled from 14 to 31. Key finding: in 2024–2025, coding time horizons doubled every approximately 4 months — matching AI 2027’s predicted acceleration. Status upgraded to on-track.
2026-03Actual doubling pace (~3 months) is faster than the predicted 4 months. AI agent capabilities advancing ahead of the scenario’s timeline.
2026-03-23Updated METR data confirms Claude Opus 4.6 at 14.5-hour 50% time horizon, up from ~12 hours after bug fix (Wikipedia, OfficeChai). METR also published study (Mar 10) finding roughly half of SWE-bench-passing PRs would not be merged by maintainers, suggesting benchmark scores overstate real-world agent capability (METR). Doubling pace remains ahead of prediction. No status change.
2026-03-30Ajeya Cotra (METR researcher, former Open Philanthropy) published analysis (March 3) noting that Opus 4.6 at ~12 hours was already past her January forecast for year-end 2026, and projecting 100+ hour time horizon by year-end at current pace (planned-obsolescence.org). She notes the benchmark suite is “nearly saturated” for short tasks, creating noise in longer-horizon estimates. Wide CI (5.3 to 66 hours), but 19 tasks estimated at 8+ hours — Opus 4.6 solved 14 of them at least once. GPT-5.4 released March 5 with Thinking and Pro variants; METR measurement pending. The 4-month doubling rate appears to be accelerating further, not stabilizing.