Sculpted Interaction: a Design-First Approach to AI Alignment
Standard chatbots are a security risk for AI alignment. Here's a design fix.
Magfrump's 'Sculpted Interaction' argues that the biggest vulnerability in AI alignment is not the model itself but the interface through which humans interact with it. The standard chatbot format, derived from the 'assistant' framing in early safety research, was never optimized for human interaction—foundational papers like 'The Assistant' and GPT-2's release contain zero mentions of UX, HCI, or user studies. This format promotes sycophancy, anthropomorphism, and allows users to disengage from critical thinking, directly undermining alignment goals.
Instead of behavioral nudges, Sculpted Interaction proposes lower-level architectural choices that constrain interactions to prioritize purpose matching and agency. By making structures that force users to stay focused on their intended goals and verify AI outputs, the approach aims to preserve human values even as AI capabilities grow. The work, completed under Groundless’ Autostructures project at AI Safety Camp, calls for engineering attention on interface design as a core alignment strategy, not an afterthought.
- Chatbot 'assistant' framing originated from safety research, not HCI—foundational papers lack any UX or UI references
- Standard interfaces promote sycophancy and anthropomorphism, burying capability gaps and undermining human judgment
- Proposed alternative: architectural choices that structurally favor beneficial use over misuse, without relying on behavioral nudges
Why It Matters
Redesigning AI interfaces to preserve human agency is as critical as model alignment for safe superintelligence.