Leading Chinese AI lab ~6 months behind US frontier

Author Johannes Haus

Last updated 2026-06-22

Behind · Geopolitics · 70% confidence

Predicted: Mid 2026 · Updated: 2026-06-22 · Source: ai-2027.com, Mid 2026: China Wakes Up

They are about six months behind the best OpenBrain models.

At a glance

Assessment: Behind
Confidence: 70%
Predicted timing: Mid 2026
Primary source: ai-2027.com, Mid 2026: China Wakes Up

What AI 2027 Predicted

The scenario places the leading Chinese AI lab (called “DeepCent,” a fictionalized DeepSeek-like entity) at roughly six months behind the best US models by mid-2026. This gap exists despite China’s compute constraints and is maintained through algorithmic innovation and efficiency gains.

How We Track This

We monitor:

Benchmark performance comparisons between Chinese and US frontier models
Release date gaps between comparable capability tiers
Assessments from AI researchers and industry leaders on the US-China gap
Arena rankings (Chatbot Arena, LMSYS) comparing Chinese and US models
ARC-AGI and other novel benchmarks that test different capability dimensions

Current Evidence

Demis Hassabis Assessment: In January 2026, Google DeepMind CEO Demis Hassabis warned that China is “just months” behind the US in AI capabilities. This suggests the gap may be narrower than the 6 months AI 2027 predicted.

Is China Catching the US in AI? Hassabis Says Gap is “Months” — VERTU

Lee Kai-fu’s Assessment: In March 2025, 01.AI founder Lee Kai-fu stated that DeepSeek had “narrowed the AI development gap with the United States to just three months in some areas” thanks to more efficient chip usage and algorithms. This was before the most recent round of US model releases.

DeepSeek Narrows China-US AI Gap to Three Months — Reuters

ARC-AGI 2 Benchmark (March 2026): On the novel ARC-AGI 2 benchmark, Chinese models (Kimi, MiniMax, DeepSeek) scored below 12%, which was “lesser than US frontier labs’ scores from July 2025” — suggesting an 8+ month gap on this particular benchmark. This challenges the “only 3 months” narrative.

Chinese Models Score Lower Than 12% on ARC-AGI 2 — OfficeChai

DeepSeek R1 market impact: The January 2025 release caused Nvidia to lose $589B in market value in a single day (-17%), as markets reassessed whether the US compute hardware advantage was as decisive as assumed. This market reaction itself is evidence that the model gap is narrower than the chip gap would suggest.

AI Futures Self-Grading: The AI Futures Project noted that the gap between top US labs was 0–2 months (closer than AI 2027’s predicted 3–9 months), but did not specifically grade the US-China gap. The tighter US inter-lab competition may be increasing the effective gap China must close.

Model Release Cadence: The November–December 2025 wave of US frontier model releases (GPT-5.2, Claude 4.5, Gemini 3, Grok 4.1) represented a dense cluster of capability advances, potentially widening the gap temporarily.

Z.ai GLM-5.2: Z.ai released GLM-5.2 in June 2026 as an open-source model aimed at coding, long-context, and long-horizon tasks. Z.ai reported a Terminal-Bench score of 81.0, up from 63.5 for GLM-5.1. This adds counterevidence to a stable six-month US lead on coding and agentic workflows, while not directly measuring the full frontier gap against the best US closed models.

Counterevidence & Limitations

The gap varies enormously by task. On coding and math, DeepSeek models have been surprisingly competitive. On novel reasoning (ARC-AGI 2), the gap appears larger.
“6 months behind” is a single-number summary of a complex, multi-dimensional comparison. Some capabilities may be at parity while others lag by more than a year.
Chinese labs have demonstrated remarkable efficiency (DeepSeek-V2’s MoE innovations, R1’s reasoning approach) that could allow rapid catch-up when new paradigms emerge.
The open-source nature of many Chinese models (DeepSeek, Qwen) means the “gap” is partly a choice about release strategy vs. actual capability.
Assessment depends on whether we measure the gap by best-model-vs-best-model or by the overall frontier of what’s been demonstrated.
Vendor-reported benchmark gains for GLM-5.2 need independent replication and are not a clean six-month lag measurement.

What Would Change Our Assessment

Upgrade to “on-track”: Chinese models consistently benchmarking ~6 months behind US frontier across major evaluations by mid-2026
Upgrade to “ahead” (gap smaller than predicted): Evidence of 3-month or less gap becoming the norm across benchmarks
Maintain “behind” (gap larger than predicted): If the ARC-AGI 2 pattern holds — Chinese models consistently matching US performance from 8+ months prior on novel benchmarks

Update History

Date	Update
2026-06-22	Z.ai released GLM-5.2 and reported large gains on coding and long-horizon task benchmarks, including 81.0 on Terminal-Bench versus 63.5 for GLM-5.1. This adds counterevidence to a stable six-month US model lead, while still requiring independent benchmark comparisons against US frontier systems. Confidence adjusted 0.65 -> 0.70.
2026-04-13	DeepSeek V4 delayed twice, now expected late April 2026. First frontier model on Chinese domestic chips (Huawei Ascend 950PR), confirmed by Reuters (Apr 4) and AFP/HKFP reporting (HKFP, findskill.ai). Specs: ~1T params (MoE, ~37B active), 1M context, multimodal. Training cost reportedly ~$5.2M. However, the 16-month gap between V3 (Dec 2024) and V4 (late April 2026) vs US labs’ 2-4 month frontier cadence suggests the gap may be widening to 8-12 months rather than the predicted 6. Transition to Huawei chips required “substantial re-engineering” per Counterpoint Research, contributing to delays. Prediction remains behind — gap may be larger than the 6 months forecast.
2026-04-06	DeepSeek V4 confirmed to run on Huawei chips — Reuters and The Information report (Apr 3) that the new model will operate on Huawei-designed chips, with Alibaba, ByteDance, and Tencent placing bulk orders for hundreds of thousands of Huawei AI chips ahead of launch (Reuters, The Information). This is the most concrete evidence yet of China building a self-sufficient AI hardware stack independent of Nvidia. Separately, MiniMax released M2.5 (80.2% SWE-bench) and M2.7 (56.22% on SWE-Pro, matching GPT-5.3-Codex) — Chinese models competitive on coding benchmarks. However, US models continue advancing: Gemini 3.1 Pro leads 13 of 16 benchmarks, Anthropic Mythos in testing. Net assessment: gap narrowing on standard benchmarks but may be widening on novel evaluations (ARC-AGI 2). No status change.
2026-03-30	DeepSeek V4 Lite appeared on the DeepSeek web interface on March 9, 2026, with improved coding performance and a knowledge cutoff of May 2025 — but the full V4 remains unreleased despite multiple predicted launch windows passing (PromptZone). The full model (1T parameters, MoE, Huawei Ascend training) was expected in February but is still delayed. Leaked benchmarks (unverified) claim 80%+ SWE-bench and 90% HumanEval. Meanwhile, a mystery model “Hunter Alpha” was found in testing and initially attributed to DeepSeek but revealed to be Xiaomi’s model (Reuters). With US models (GPT-5.4, Anthropic Mythos in testing) advancing while China’s next major release slips, the gap assessment remains uncertain but may be widening slightly.
2026-03-16	DeepSeek V4 withheld from US chipmakers (Nvidia, AMD), granting Huawei exclusive early access (Reuters, Feb 26). Signals growing US-China AI decoupling in model ecosystem. DeepSeek V4 benchmarks not yet public but preferential treatment of domestic hardware suggests strategic positioning. Meanwhile US frontier models (GPT-5.4, Opus 4.6, Gemini 3.1 Pro) pushed benchmarks further — gap assessment depends on V4 performance data. No status change pending V4 benchmarks.
2026-03	Gap appears variable: 3 months on standard benchmarks, 8+ months on novel evaluations like ARC-AGI 2. The predicted ‘6 months’ may overstate Chinese capabilities on harder evaluations.
2025-12	DeepSeek-V3 and Qwen models competitive on standard benchmarks, narrowing measured gap to ~3 months on some tasks.
2025-09	DeepSeek reveals R1 training cost of $294,000 — a dramatic demonstration that China can train competitive models without U.S.-level compute. Alibaba’s Qwen3-Max (1T+ parameters, September 24) claims to outperform Claude and DeepSeek-V3.1 on certain agentic benchmarks. China’s frontier model development appears resilient under compute constraints.
2025-08	DeepSeek releases upgraded V3.1 optimized for domestic chips using UE8M0 FP8 precision format, signaling active adaptation to export control constraints. Combined with June 2025 METR finding of ~6-month capability gap, China’s strategy appears to be efficiency-focused rather than compute-scaling-focused.
2025-06	METR publishes evaluation of mid-2025 DeepSeek and Qwen models (June 27): “autonomous capabilities of mid-2025 DeepSeek models similar to capabilities of frontier models from late 2024.” METR data confirms approximately 6-month gap on agentic tasks — real but not as large as some export-control-based forecasts assumed.
2025-04	Alibaba releases Qwen3 (April 28): 235B-A22B MoE model with benchmark scores competitive with some closed frontier models. Apache 2.0. Suggests China’s open-source AI ecosystem is not standing still under compute constraints.