AI Safety

It Is Reasonable To Research How To Use Model Internals In Training

A leading AI safety expert pushes back against a controversial research taboo.

Deep Dive

An AI safety researcher argues that using a model's internal processes during training, a technique some consider forbidden, is a normal and necessary area of study. They contend it could be crucial for ensuring future AI systems are safe and aligned with human values. The author, noting that major labs are already researching this, calls for more work to understand its potential benefits and risks without premature condemnation.

Why It Matters

This debate shapes the foundational tools we'll use to control and understand powerful future AI systems.