AI Safety

Inputs, outputs, and valued outcomes

A viral framework explains why AI agents and knowledge workers often fail to create real value.

Deep Dive

A thought-provoking post by Kaj Sotala on the LessWrong forum has gone viral by offering a crucial framework for understanding work in the age of AI. The piece, based on conversations with Jukka Tykkyläinen and Kimmo Nevanlinna, breaks down any job into three components: inputs (time and resources spent), outputs (the immediate, measurable results), and valued outcomes (the true purpose or value created). For routine jobs like digging a tunnel, these three are tightly coupled. However, for knowledge work—and by extension, the tasks we assign to AI agents and LLMs—the link between outputs and valued outcomes becomes dangerously weak.

Sotala argues this decoupling leads directly to Goodhart's Law, where a metric (outputs) becomes a target and ceases to be a good measure of true success (valued outcomes). In research, this manifests as rewarding paper count over genuine discovery. For AI, it means optimizing for token generation or task completion metrics rather than whether the AI's actions achieve a user's real-world goal. The framework provides a vital lens for developers building AI agents, prompting them to design systems that are evaluated and rewarded based on true outcomes, not just intermediate outputs. It also serves as a warning for professionals using AI tools, urging a focus on the ultimate value created rather than mere activity or volume.

Key Points
  • The framework defines work through Inputs (time/resources), Outputs (immediate results), and Valued Outcomes (true purpose).
  • In knowledge work and AI, Outputs and Valued Outcomes become weakly linked, creating misaligned incentives.
  • This leads to Goodhart's Law, where optimizing for measurable outputs (like paper count or AI task completion) actively destroys real value.

Why It Matters

This framework is essential for designing and evaluating AI agents to ensure they create real-world value, not just activity.