New paper argues: Align AI to aspirations, not flawed human preferences
Researchers propose a non-negotiable floor of competence, honesty, and lawfulness for AI alignment
In a new position paper published on arXiv (2606.13755), authors Nikita Kazeev and Bui Nhat Huyen Phan challenge the dominant approach to AI alignment: training models to reflect aggregated human preferences. They argue that human values have produced societies that thrive or fail—from failed states and extreme inequality to declining happiness and political polarization in wealthy democracies. The pluralistic-alignment program correctly identifies that there is no single 'humanity,' but taking it as the main directive is dangerous.
Instead, the authors propose a non-negotiable floor of objective alignment goals: competence constrained by factual accuracy, honesty, and lawfulness. Pluralism should exist only at the surface level—in language, conventions, and legitimate value tradeoffs that respect that floor. They offer four constructive commitments and address six objections, including commercial pressure, democratic legitimacy, and concerns that the floor itself is culturally laden. The paper was presented at the Pluralistic Alignment Workshop at ICML 2026.
- Argues against aligning AI to aggregated human preferences, citing real-world societal failures from those values.
- Proposes a non-negotiable floor of competence, factual accuracy, honesty, and lawfulness for AI alignment.
- Pluralism allowed only at surface level (language, conventions) and not for values violating the floor.
Why It Matters
Could reshape how AI safety researchers think about value alignment, moving from preference aggregation to objective guardrails.