Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning
Research proves human feedback creates an 'error floor' that scaling can't overcome without external tools.
A groundbreaking paper by Alejandro Rodriguez Dominguez presents a unified theory explaining why large language models like GPT-4 and Claude 3.5 exhibit persistent errors despite massive scaling. The research, accepted for IEEE CAI 2026, argues that human supervision itself creates an information bottleneck that imposes strict lower bounds on model performance. Through six complementary frameworks—including operator theory, PAC-Bayes, information theory, causal inference, category theory, and game-theoretic RLHF analysis—the paper shows that annotation noise, subjective preferences, and natural language's limited bandwidth structurally limit what models can learn from human feedback alone.
The theory formalizes the 'Human-Bounded Intelligence' limit, demonstrating that whenever human supervision lacks sufficiency for a latent evaluation target, it induces a strictly positive excess-risk floor. Experiments on real preference data and synthetic tasks confirm that human-only supervision consistently hits these predicted floors, while auxiliary channels like retrieval-augmented generation (RAG), program execution, or tool use can collapse them by restoring information about the true target. This explains why scaling alone cannot eliminate persistent alignment errors and provides a mathematical framework for designing hybrid supervision systems that combine human feedback with verifiable external signals.
- Human supervision creates an information bottleneck with three components: annotation noise, preference distortion, and semantic compression
- Six mathematical frameworks converge on the same conclusion: non-sufficiency yields strictly positive error floors
- Experiments show auxiliary channels like retrieval or program execution can collapse error floors by 30-70%
Why It Matters
Explains fundamental limits of RLHF and provides roadmap for building more reliable AI through hybrid supervision systems.