9 kinds of hard-to-verify tasks
New taxonomy reveals why AI verification fails—from $1M research comparisons to dangerous nanobot tests.
AI researcher Cleo Nardo has published a significant framework challenging how the field thinks about verifying AI outputs. Rather than treating 'hard-to-verify tasks' as a single category, Nardo identifies 9 distinct types where verification fails for different reasons. These range from practical constraints—like verification costing $100-$1,000 per comparison for SAE experiments or millions for research agenda evaluations—to fundamental limitations where information isn't physically recoverable or verification would destroy the system being tested.
The framework includes categories where verification requires expensive human time (like getting Terry Tao's judgment), lacks NP-ish structure (chess moves in complex middlegames), or presents ethical/legal constraints (accessing private medical records). Most critically, Nardo identifies dangerous verification scenarios where testing the AI's output could cause catastrophe—such as running a nanobot factory that might produce paperclips instead of curing Alzheimer's. This taxonomy reveals why blanket claims about 'hard-to-verify tasks' often mislead, as different categories require fundamentally different oversight strategies.
Nardo's analysis suggests the AI safety community needs more precise language when discussing scalable oversight. Rather than seeking universal verification strategies, researchers should develop targeted approaches for specific verification failure modes. The framework also highlights that some tasks have no ground truth—like developing population axiologies—where verification in the traditional sense doesn't apply, requiring different evaluation paradigms entirely.
- Identifies 9 distinct categories of hard-to-verify tasks, challenging the binary 'easy vs hard' classification
- Includes practical constraints like $100-$1M verification costs and fundamental limits like unrecoverable information
- Highlights dangerous verification scenarios where testing AI outputs could cause catastrophic outcomes
Why It Matters
Provides crucial precision for AI safety research, showing why different verification failures need different oversight strategies.