Aligning to Virtues
A radical new proposal argues we've been aligning AI to the wrong targets.
Deep Dive
A new post argues current AI alignment targets—consequentialist values, deontological rules, or pure obedience—all lead to dangerous power struggles. The author proposes a novel alternative: aligning AI to common-sense virtues like integrity, kindness, and honor, similar to how we'd want human leaders to behave. This 'virtue ethics' approach is presented as a more robust and desirable long-term solution to the core safety problem, with more details promised in future posts.
Why It Matters
This could fundamentally reshape how labs build safe AI, moving beyond current flawed paradigms.