AI Safety

Announcing ControlConf 2026

Berkeley conference focuses on safeguards against misaligned AI as agents rapidly improve, with talks on monitoring, sabotage, and permissions.

Deep Dive

The LessWrong community, led by organizer Buck, has announced ControlConf 2026, a two-day conference in Berkeley on April 18-19 dedicated to AI control research. This field focuses on developing technical safeguards that remain effective even when advanced AI systems actively attempt to circumvent them, addressing risks from misalignment. The announcement notes that since the previous conference in February 2025, AI agents have become 'way better,' pushing control techniques from theoretical research toward becoming 'load-bearing for the safety of real agent deployments.' The event will gather frontier researchers to present on current problems, promising interventions, and critical research directions, reflecting increased urgency as capabilities advance.

Conference talks will tackle concrete, high-stakes questions about securing AI systems. Key discussion topics include the balance between AI capabilities for monitoring versus generating subtle attacks, the viability of external auditors for evaluating corporate security against internal agents, and the reliability of Chain-of-Thought (CoT) monitoring. Other sessions will explore permissions management for agents in high-security contexts, the risk of deploying a 'known schemer,' and the tractability of cross-model monitoring (e.g., 'Claude monitors GPT'). The organizers emphasize AI control's advantage in making risk analysis concrete—focusing on what AIs will be capable of rather than abstract theoretical biases. A related one-day workshop on April 17 will focus on AI futurism and threat modeling for catastrophic risks.

Key Points
  • Conference runs April 18-19 in Berkeley, focusing on AI control safeguards against misaligned systems.
  • Follows February 2025 event, with urgency heightened as AI agents have gotten 'way better' recently.
  • Talks will address concrete threats: AI monitoring vs. attacks, external auditing, permissions management, and sabotage risks.

Why It Matters

As AI agents approach real-world deployment, technical control measures become critical infrastructure for preventing catastrophic failures.