AI Safety

Scaling Laws for Moral Machine Judgment in Large Language Models

Moral reasoning scales like other AI capabilities—with a power-law relationship.

Deep Dive

A new study by Kazuhiro Takemoto, published on arXiv, investigates whether moral judgment in large language models follows predictable scaling laws similar to other capabilities. The researcher evaluated 75 LLM configurations ranging from 0.27 billion to 1,000 billion parameters using the Moral Machine framework, which presents ethical dilemmas involving life-and-death decisions. The key finding: alignment with human preferences improves via a power-law relationship, with the distance from human judgment decreasing as D ∝ S^{-0.10±0.01} (R²=0.50, p<0.001).

Notably, extended reasoning models showed significantly better moral alignment, with the effect most pronounced in smaller models. Variance in moral judgments also decreased at larger scales, suggesting that bigger models not only get closer to human ethics but do so more consistently. The findings extend scaling law research beyond traditional benchmarks into value-based judgments, providing empirical grounding for AI governance discussions. The study controlled for model family and reasoning capabilities, confirming the relationship holds across diverse architectures.

Key Points
  • 75 LLM configurations from 0.27B to 1000B parameters tested on Moral Machine dilemmas
  • Power-law scaling: distance from human preferences decreases as D ∝ S^{-0.10} (R²=0.50)
  • Extended reasoning models improve moral alignment, especially at smaller scales

Why It Matters

Moral judgment in AI scales predictably with size, informing safety and governance for autonomous systems.