METR time horizon doubles every 4 months

Author Johannes Haus

Last updated 2026-05-25

Ahead · Agent Autonomy · 80% confidence

Predicted: ~4 month doubling from 2024+ ·Adjusted: ~4.3 month doubling (METR 1.1, 2023+ data) · Updated: 2026-05-25 · Source: ai-2027.com, Appendix G (page 51)

METR time horizons doubled every 7 months from 2019-2024 and every 4 months from 2024-onward (Appendix G, page 51). The acceleration from 7-month to 4-month doubling is a key claim.

At a glance

Assessment: Ahead
Confidence: 80%
Predicted timing: ~4 month doubling from 2024+
Primary source: ai-2027.com, Appendix G (page 51)

What AI 2027 Predicted

The scenario cites METR’s time horizon benchmark as a key metric for measuring agent capability. It predicts the time horizon (how long an AI agent can work autonomously on a task) doubles approximately every 4 months. This exponential improvement is central to the scenario’s timeline for reaching human-level autonomy.

How We Track This

We monitor:

Official METR time horizon publications and updates
Community estimates for new models (EA Forum, etc.)
The ratio between predicted and actual doubling rates
AI Futures Project’s own grading against this metric

Current Evidence

METR TH1.1 update (Jan 2026) provides refined data with 228 tasks (up from 170). Key findings: all-time doubling is 188 days (6.3 months), but from 2023 onward it’s 129 days (4.3 months), and from 2024 onward just 89 days (3 months). Note: The previous “~4.7 month” estimate was based on an earlier calculation method; the 4.3-month figure from METR TH1.1’s 2023+ window is the current best estimate. Claude Opus 4.6 achieved a 50% time horizon of 719 minutes (~12 hours) and an 80% horizon of 70 minutes. EA Forum analysis estimates Opus 4.6 could reach 8–12 hours on official METR measurement. The 4-month doubling rate is consistent with the 2023+ data. AI Futures’ own grading places the pace at 1.04× their corrected trajectory, though the rate varies significantly depending on the time window selected.

METR’s May 2026 Frontier Risk Report adds a newer, cautiously framed datapoint from a Feb-Mar 2026 pilot with Anthropic, Google, Meta, and OpenAI. METR reported that the strongest assessed agents were near or beyond the reliable measurement range of Time Horizon 1.1, with the most capable shared model estimated at roughly 16-20 hours on the 50% horizon and 3-4 hours on the 80% horizon. The report explicitly cautions that estimates above 16 hours are unreliable because of suite saturation, so this supports the “ahead” assessment while strengthening the saturation caveat.

Sources:

Counterevidence & Limitations

Different time windows give different doubling rates (all-time: 6.3 months; 2024+: 3 months)
METR’s task set is synthetic and may not reflect real-world agent performance
METR warns that Time Horizon 1.1 estimates above 16 hours are unreliable because the suite is saturating
The 80% and 50% horizons tell different stories — which one to track matters
Progress may plateau as agents encounter harder tasks requiring different capabilities

What Would Change Our Assessment

Maintain at “ahead”: Doubling rate continues at or below 4 months
Downgrade to “on-track”: Doubling rate settles above 4 months consistently
Downgrade to “behind”: Clear evidence of plateau in time horizon growth

Update History

Date	Update
2026-05-25	METR’s Frontier Risk Report reported that the strongest Feb-Mar 2026 assessed agents were saturating Time Horizon 1.1, with a most-capable shared model point estimate around 16-20 hours at the 50% horizon and 3-4 hours at the 80% horizon. This reinforces the ahead assessment, but METR cautions that measurements above 16 hours are unreliable with the current task suite.
2026-04-13	METR time horizon benchmark page updated Feb 4, 2026 with new model measurements. Academic literature (GovAI, March 2026) notes significant uncertainty in translating benchmark performance to real-world R&D productivity gains. The doubling trend continues broadly on pace with the ~4-month rate. No dramatic acceleration or deceleration evident. No status or confidence change. Sources: METR, METR simpler timelines model, arXiv:2603.03992
2026-04-02	AI Futures Project Q1 2026 update revised METR doubling time from 5.5 months to 4 months (Kokotajlo) and 4.5 months (Lifland), citing METR v1.1 trend and new model evaluations (Gemini 3, GPT-5.2, Opus 4.6). The authors’ own assessment now aligns closely with our “Ahead” status. Source: LessWrong
2026-03-30	Ajeya Cotra (METR researcher, former Open Philanthropy) published analysis (March 3) noting that Opus 4.6 at ~12 hours was already past her January forecast for year-end 2026, and projecting 100+ hour time horizon by year-end at current pace (planned-obsolescence.org). She notes the benchmark suite is “nearly saturated” for short tasks, creating noise in longer-horizon estimates. Wide CI (5.3 to 66 hours), but 19 tasks estimated at 8+ hours — Opus 4.6 solved 14 of them at least once. GPT-5.4 released March 5 with Thinking and Pro variants; METR measurement pending. The 4-month doubling rate appears to be accelerating further, not stabilizing.
2026-03-23	METR’s Opus 4.6 measurement initially appeared around 14.5 hours on the 50% horizon, but current TH1.1 raw data estimates roughly 11h59m at 50% and roughly 1h10m at 80%. METR also published study (Mar 10) finding roughly half of SWE-bench-passing PRs would not be merged by maintainers, suggesting benchmark scores overstate real-world agent capability (METR). Doubling pace remains ahead of prediction. No status change.
2026-03	Actual doubling pace (~3 months) is faster than the predicted 4 months. AI agent capabilities advancing ahead of the scenario’s timeline.
2026-01	METR released Time Horizon 1.1: expanded from 170 to 228 tasks, long tasks (8h+) doubled from 14 to 31. Key finding: in 2024–2025, coding time horizons doubled every approximately 4 months — matching AI 2027’s predicted acceleration. Status upgraded to on-track.
2025-12	AI Futures Project Dec 2025 model update assessed METR coding time horizon as tracking at approximately 1.04× their central AI-2027-speed trajectory. The historical 7-month doubling rate confirmed across domains showed signs of acceleration in the most recent period.
2025-11	GPT-5.1-Codex-Max METR evaluation (November 19) shows 50% time horizon of approximately 2 hours 42 minutes — up from 2h17m in August. Share of success on hardest AI R&D-relevant tasks jumps from 2% to 8% (4x improvement). Doubling trend continues.
2025-08	GPT-5 launches with METR pre-deployment evaluation disclosing a 50% time horizon of approximately 2 hours 17 minutes on software engineering tasks (August 7). METR’s August 12 research update confirms the ~7-month doubling trend across domains for the 2019-2024 historical period, while noting recent acceleration toward 4-month doubling.
2025-07	METR publishes domain time-horizon analysis (July 14): approximately 7-month doubling time across software engineering, ML, and cybersecurity domains. Frontier models (Claude 3.7) at ~50-minute 50% horizon in early 2025. The 7-month historical average is slower than AI 2027’s predicted 4-month acceleration, but the extrapolation still arrives at approximately early 2027 — consistent with the essay’s timeline.