Frontier model trained at 10²⁷ FLOP (Agent-0, completes May 2025)

Emerging · Model Capability · 45% confidence
Predicted: May 2025 · Updated: 2026-03-13 · Source: ai-2027.com, Late 2025: The World's Most Expensive AI
OpenBrain's latest public model—Agent-0—was trained with 10²⁷ FLOP.

What AI 2027 Predicted

The scenario describes “Agent-0,” a frontier model trained with 10²⁷ FLOP (10× less than the headline 10²⁸ run, but still a massive jump from GPT-4’s ~2×10²⁵ FLOP). This model represents the public-facing output of the first wave of next-generation training infrastructure, released in late 2025. Agent-0 is described as impressive in capability but still limited compared to what follows.

How We Track This

We monitor:

  • Epoch AI’s estimates of training compute for frontier models (GPT-5, Claude 4, Gemini Ultra 2)
  • Lab disclosures about training scale and infrastructure
  • Third-party analysis of cluster sizes and training durations
  • Hardware deployment timelines (Blackwell clusters, custom ASICs)

Current Evidence

Epoch AI estimated GPT-5 pretraining compute at approximately 3×10²⁵ FLOP — roughly in the same order of magnitude as GPT-4, not the 10²⁷ the scenario envisioned. Epoch’s analysis notes that GPT-5 actually used less training compute than GPT-4.5, with OpenAI apparently prioritizing efficiency and post-training over raw scale. Over 30 models have now been trained above the 10²⁵ FLOP threshold, suggesting this scale has become commoditized rather than frontier-pushing.

The gap between actual compute (~3×10²⁵) and predicted compute (10²⁷) is roughly 30×, which is substantial. However, there is significant uncertainty in these estimates, and post-training compute (RL, RLHF) may add meaningfully to total training FLOP in ways that aren’t well-characterized.

Sources:

Counterevidence & Limitations

  • Labs are increasingly opaque about training compute, and Epoch’s estimates carry wide uncertainty ranges
  • The shift toward inference-time compute and RL post-training may mean raw pretraining FLOP is the wrong metric; total effective compute could be substantially higher
  • GPT-5 achieved major capability gains despite seemingly modest compute scaling, suggesting algorithmic efficiency improvements partially substituted for raw scale
  • Some labs may have completed larger training runs that haven’t been publicly characterized

What Would Change Our Assessment

  • Upgrade to “on-track”: Credible evidence that a 2025 model used ≥10²⁶·⁵ total training compute (pretraining + post-training combined)
  • Upgrade to “confirmed”: A confirmed training run at or near 10²⁷ FLOP
  • Downgrade to “behind”: If Epoch revises estimates downward or the largest 2025 runs are confirmed well below 10²⁶·⁵

Update History

DateUpdate
2026-03GPT-5 estimated at ~3×10²⁵ FLOP pretraining, well below 10²⁷ target. Significant uncertainty remains around total compute including post-training and inference-time scaling.