Coding fully automated but research taste still requires humans
Now that coding has been fully automated... research taste has proven difficult to train due to longer feedback loops and less data availability.
What AI 2027 Predicted
The scenario describes a milestone around early-to-mid 2027 where coding has been “fully automated” — AI can independently write, debug, and deploy production-quality software. However, the scenario distinguishes this from full AI research automation: “research taste” — the ability to identify promising research directions, evaluate novelty, and prioritize among open problems — remains a human bottleneck. This is attributed to longer feedback loops and less training data for research judgment compared to code correctness.
This prediction contains two separable claims: (1) coding reaches full automation, and (2) research direction-setting resists automation even after coding is solved.
How We Track This
We monitor:
- AI performance on end-to-end software engineering tasks (not just bug fixes)
- METR time horizons for autonomous coding
- Lab reports on AI contributions to research beyond implementation
- Papers and benchmarks measuring “research taste” or scientific creativity
- AI Futures Project’s R&D multiplier estimates
Current Evidence
On coding automation: Progress is substantial but well short of “full automation.” Claude Code is generating $500M+ run-rate revenue and coding agents are increasingly autonomous — taking instructions via Slack and making substantial code changes independently. However, the AI Futures grading (Feb 2026) pegged Daniel Kokotajlo’s median for full coding automation at 2029, and Eli Lifland’s at early 2030s. Current agents handle tasks on the scale of hours; the scenario requires weeks-to-months of autonomous work.
On research taste lagging: Early evidence supports the asymmetry the scenario describes. The AI Futures grading noted AI is helping substantially with coding but “not as much with other parts of AI research.” AI R&D uplift estimates were revised downward from initial predictions. The distinction between implementation skill (where AI excels) and research judgment (where humans still lead) is already visible in practice.
The January 2026 paper “Internal Deployment Gaps in AI Regulation” (arXiv:2601.08005) documents how AI labs are deploying capable systems internally for R&D acceleration, but the most impactful contributions remain in code implementation rather than research direction.
Anthropic Internal Study (Aug 2025): Anthropic surveyed 132 engineers and analyzed 200,000 internal Claude Code transcripts. Key finding: engineers can only “fully delegate” 0-20% of their work to AI. While coding tasks are increasingly automated (67% more merged PRs), the tasks that remain human are higher-complexity (average task complexity increased from 3.2 to 3.8). Engineers report becoming more “full-stack” — suggesting AI handles the implementation while humans focus on design, planning, and judgment. This is exactly the “coding automated but taste lags” dynamic the prediction describes.
METR Evidence (Feb 2026): Developers now refuse to work without AI tools (per METR’s study redesign announcement), confirming the “automated” half. But METR’s time-horizon data shows a gap between coding tasks (well-automated) and research engineering tasks (RE-Bench scores lag behind). The gap between what AI can code and what AI can research is the operational definition of “taste lagging.”
Sources:
- Grading AI 2027’s 2025 Predictions — AI Futures Project
- Internal Deployment Gaps in AI Regulation — arXiv
Counterevidence & Limitations
- The boundary between “coding” and “research” is blurry — much of ML research is essentially writing and testing code
- Some argue AlphaProof-style breakthroughs show AI can already exhibit research taste in narrow domains (mathematics)
- Full coding automation may arrive gradually rather than as a discrete event, making the prediction harder to adjudicate
- “Research taste” is inherently difficult to measure, making this prediction partially unfalsifiable
- Labs have strong incentives not to disclose how much research direction is AI-driven vs. human-driven
What Would Change Our Assessment
- Upgrade to “emerging”: Clear evidence that coding agents can autonomously complete multi-week projects without human oversight
- Upgrade to “on-track”: AI demonstrably handling full software engineering workflows while lab leaders publicly note that research prioritization remains human-driven
- Downgrade confidence: If AI systems demonstrate strong research direction capabilities (e.g., proposing novel architectures that prove successful), undermining the “taste lags” half of the prediction
Update History
| Date | Update |
|---|---|
| 2025-08 | Anthropic internal study: engineers can only “fully delegate” 0-20% of work. Task complexity rising (3.2→3.8). Coding increasingly automated; judgment and design remain human. |
| 2025-11 | Claude Code at $1B ARR confirms massive coding automation adoption. SWE-bench crosses 80%. But RE-Bench (research engineering) scores remain 0.5-0.8 — coding ahead, research taste behind. |
| 2026-02 | METR notes developers refuse to work without AI (coding automated). METR time horizons: Opus 4.6 at 14.5h for general tasks, but RE-Bench-specific research engineering scores still lag. Gap narrows but persists. |
| 2026-03 | Prediction timeframe not yet reached. Coding automation progressing rapidly but far from complete. Early signals support the predicted asymmetry — routine coding tasks automate faster than research taste and architectural judgment. |