Frontier model trained at 10²⁷ FLOP (Agent-0, completes May 2025)
OpenBrain's latest public model—Agent-0—was trained with 10²⁷ FLOP.
What AI 2027 Predicted
The scenario describes “Agent-0,” a frontier model trained with 10²⁷ FLOP (10× less than the headline 10²⁸ run, but still a massive jump from GPT-4’s ~2×10²⁵ FLOP). This model represents the public-facing output of the first wave of next-generation training infrastructure, released in late 2025. Agent-0 is described as impressive in capability but still limited compared to what follows.
How We Track This
We monitor:
- Epoch AI’s estimates of training compute for frontier models (GPT-5, Claude 4, Gemini Ultra 2)
- Lab disclosures about training scale and infrastructure
- Third-party analysis of cluster sizes and training durations
- Hardware deployment timelines (Blackwell clusters, custom ASICs)
Current Evidence
Epoch AI estimated GPT-5 pretraining compute at approximately 3×10²⁵ FLOP — roughly in the same order of magnitude as GPT-4, not the 10²⁷ the scenario envisioned. Epoch’s analysis notes that GPT-5 actually used less training compute than GPT-4.5, with OpenAI apparently prioritizing efficiency and post-training over raw scale. Over 30 models have now been trained above the 10²⁵ FLOP threshold, suggesting this scale has become commoditized rather than frontier-pushing.
The gap between actual compute (~3×10²⁵) and predicted compute (10²⁷) is roughly 30×, which is substantial. However, there is significant uncertainty in these estimates, and post-training compute (RL, RLHF) may add meaningfully to total training FLOP in ways that aren’t well-characterized.
Sources:
- Notes on GPT-5 training compute — Epoch AI
- Why GPT-5 used less training compute than GPT-4.5 — Epoch AI
- Over 30 AI models trained at GPT-4 scale — Epoch AI
- Grading AI 2027’s 2025 Predictions — AI Futures Project
Counterevidence & Limitations
- Labs are increasingly opaque about training compute, and Epoch’s estimates carry wide uncertainty ranges
- The shift toward inference-time compute and RL post-training may mean raw pretraining FLOP is the wrong metric; total effective compute could be substantially higher
- GPT-5 achieved major capability gains despite seemingly modest compute scaling, suggesting algorithmic efficiency improvements partially substituted for raw scale
- Some labs may have completed larger training runs that haven’t been publicly characterized
What Would Change Our Assessment
- Upgrade to “on-track”: Credible evidence that a 2025 model used ≥10²⁶·⁵ total training compute (pretraining + post-training combined)
- Upgrade to “confirmed”: A confirmed training run at or near 10²⁷ FLOP
- Downgrade to “behind”: If Epoch revises estimates downward or the largest 2025 runs are confirmed well below 10²⁶·⁵
Update History
| Date | Update |
|---|---|
| 2026-03 | GPT-5 estimated at ~3×10²⁵ FLOP pretraining, well below 10²⁷ target. Significant uncertainty remains around total compute including post-training and inference-time scaling. |