METR time horizon doubles every 4 months

Ahead · Agent Autonomy · 80% confidence

Predicted: ~4 month doubling from 2024+ ·Adjusted: ~4.3 month doubling (METR 1.1, 2023+ data) · Updated: 2026-04-02 · Source: ai-2027.com, Appendix G (page 51)

METR time horizons doubled every 7 months from 2019-2024 and every 4 months from 2024-onward (Appendix G, page 51). The acceleration from 7-month to 4-month doubling is a key claim.

What AI 2027 Predicted

The scenario cites METR’s time horizon benchmark as a key metric for measuring agent capability. It predicts the time horizon (how long an AI agent can work autonomously on a task) doubles approximately every 4 months. This exponential improvement is central to the scenario’s timeline for reaching human-level autonomy.

How We Track This

We monitor:

Official METR time horizon publications and updates
Community estimates for new models (EA Forum, etc.)
The ratio between predicted and actual doubling rates
AI Futures Project’s own grading against this metric

Current Evidence

METR TH1.1 update (Jan 2026) provides refined data with 228 tasks (up from 170). Key findings: all-time doubling is 188 days (6.3 months), but from 2023 onward it’s 129 days (4.3 months), and from 2024 onward just 89 days (3 months). Note: The previous “~4.7 month” estimate was based on an earlier calculation method; the 4.3-month figure from METR TH1.1’s 2023+ window is the current best estimate. Claude Opus 4.6 achieved a 50% time horizon of 719 minutes (~12 hours) and an 80% horizon of 70 minutes. EA Forum analysis estimates Opus 4.6 could reach 8–12 hours on official METR measurement. The 4-month doubling rate is consistent with the 2023+ data. AI Futures’ own grading places the pace at 1.04× their corrected trajectory, though the rate varies significantly depending on the time window selected.

Sources:

Counterevidence & Limitations

Different time windows give different doubling rates (all-time: 6.3 months; 2024+: 3 months)
METR’s task set is synthetic and may not reflect real-world agent performance
The 80% and 50% horizons tell different stories — which one to track matters
Progress may plateau as agents encounter harder tasks requiring different capabilities

What Would Change Our Assessment

Maintain at “ahead”: Doubling rate continues at or below 4 months
Downgrade to “on-track”: Doubling rate settles above 4 months consistently
Downgrade to “behind”: Clear evidence of plateau in time horizon growth

Update History

Date	Update
2025-07	METR publishes domain time-horizon analysis (July 14): approximately 7-month doubling time across software engineering, ML, and cybersecurity domains. Frontier models (Claude 3.7) at ~50-minute 50% horizon in early 2025. The 7-month historical average is slower than AI 2027’s predicted 4-month acceleration, but the extrapolation still arrives at approximately early 2027 — consistent with the essay’s timeline.
2025-08	GPT-5 launches with METR pre-deployment evaluation disclosing a 50% time horizon of approximately 2 hours 17 minutes on software engineering tasks (August 7). METR’s August 12 research update confirms the ~7-month doubling trend across domains for the 2019-2024 historical period, while noting recent acceleration toward 4-month doubling.
2025-11	GPT-5.1-Codex-Max METR evaluation (November 19) shows 50% time horizon of approximately 2 hours 42 minutes — up from 2h17m in August. Share of success on hardest AI R&D-relevant tasks jumps from 2% to 8% (4x improvement). Doubling trend continues.
2025-12	AI Futures Project Dec 2025 model update assessed METR coding time horizon as tracking at approximately 1.04× their central AI-2027-speed trajectory. The historical 7-month doubling rate confirmed across domains showed signs of acceleration in the most recent period.
2026-01	METR released Time Horizon 1.1: expanded from 170 to 228 tasks, long tasks (8h+) doubled from 14 to 31. Key finding: in 2024–2025, coding time horizons doubled every approximately 4 months — matching AI 2027’s predicted acceleration. Status upgraded to on-track.
2026-03	Actual doubling pace (~3 months) is faster than the predicted 4 months. AI agent capabilities advancing ahead of the scenario’s timeline.
2026-03-23	Updated METR data confirms Claude Opus 4.6 at 14.5-hour 50% time horizon, up from ~12 hours after bug fix (Wikipedia, OfficeChai). METR also published study (Mar 10) finding roughly half of SWE-bench-passing PRs would not be merged by maintainers, suggesting benchmark scores overstate real-world agent capability (METR). Doubling pace remains ahead of prediction. No status change.
2026-03-30	Ajeya Cotra (METR researcher, former Open Philanthropy) published analysis (March 3) noting that Opus 4.6 at ~12 hours was already past her January forecast for year-end 2026, and projecting 100+ hour time horizon by year-end at current pace (planned-obsolescence.org). She notes the benchmark suite is “nearly saturated” for short tasks, creating noise in longer-horizon estimates. Wide CI (5.3 to 66 hours), but 19 tasks estimated at 8+ hours — Opus 4.6 solved 14 of them at least once. GPT-5.4 released March 5 with Thinking and Pro variants; METR measurement pending. The 4-month doubling rate appears to be accelerating further, not stabilizing.