This category contains 6 tracked predictions. Each page includes the original claim, current evidence, counterevidence, and what would change our assessment.

Agents struggle with long-horizon tasks Confirmed
2025 Updated 2026-03-13 85% confidence

Agent-1 is bad at even simple long-horizon tasks (page 7, Early 2026 section). Also: agents in Mid 2025 are 'impressive in theory but in practice unreliable.'

AI model capable of autonomous self-replication Emerging
January 2027 Updated 2026-03-30 50% confidence

The safety team finds that if Agent-2 somehow escaped and wanted to 'survive' and 'replicate' autonomously, it might be able to do so.

Best AI agents cost hundreds of dollars per month Confirmed
Mid 2025 Updated 2026-04-13 95% confidence

The better agents are also expensive; you get what you pay for, and the best performance costs hundreds of dollars a month.

Computer-using agents marketed as 'personal assistants' Confirmed
Mid 2025 Updated 2026-03-13 85% confidence

Advertisements for computer-using agents emphasize the term 'personal assistant': you can prompt them with tasks like 'order me a burrito on DoorDash' or 'open my budget spreadsheet and sum this month's expenses.'

METR time horizon doubles every 4 months Ahead
~4 month doubling from 2024+ Updated 2026-04-13 80% confidence

METR time horizons doubled every 7 months from 2019-2024 and every 4 months from 2024-onward (Appendix G, page 51). The acceleration from 7-month to 4-month doubling is a key claim.

Unreliable but useful AI agents emerge Confirmed
Mid 2025 Updated 2026-03-13 95% confidence

AI agents become increasingly useful for real tasks but remain unreliable on complex, multi-step workflows.