New AI Ethics Study Reveals the 'Alignment Target Problem' in Moral Judgments
A study of 1,002 adults shows we judge AI programmers more harshly than the AI itself.
A new study from researchers Benjamin Minhao Chen and Xinyu Xie, presented at ACM FAccT 2026, tackles a fundamental question in AI alignment: whose moral standards should guide machine behavior? The paper, titled 'The Alignment Target Problem,' challenges the common assumption that the appropriate benchmark for AI is how a human would act in the same situation. Through an online experiment with 1,002 U.S. adults, participants evaluated a classic moral dilemma—a runaway mine train—under four conditions: a human repairman, a repair robot, a robot programmed by company engineers, and the engineers who programmed the robot.
The results reveal a nuanced pattern. When comparing the human repairman and the repair robot acting autonomously, there was no significant difference in moral judgments. However, the moment the robot's actions were framed as the product of human design, participants shifted toward deontological, rule-based reasoning. This effect held both when evaluating the programmed robot and when directly evaluating the engineers who programmed it. The researchers conclude that making human agency visible activates heightened moral constraints, leading to more rigid judgments.
These findings give rise to what the authors call the 'alignment target problem': which normative target should guide the development of artificial moral agents? The study suggests that people's evaluations of humans, AI systems, and their designers do not necessarily converge, complicating efforts to create a coherent value alignment framework. For technologists and policymakers, this means that simply modeling AI behavior on human responses may be insufficient—designers must also consider how the public perceives the humans behind the machine.
- No significant moral judgment difference between a human repairman and an autonomous repair robot in a runaway train scenario.
- When the robot's actions were attributed to human programming, participants applied stricter deontological (rule-based) reasoning to both the robot and its human designers.
- The 'alignment target problem' challenges the assumption that human behavior is the correct benchmark for AI moral decision-making.
Why It Matters
Redefines AI alignment targets: designers must account for how users judge the humans behind the machine.