AI Safety

A Research Bet on SAE-like Expert Architectures

LessWrong AI April 17, 2026

⚡A new architecture aims to build interpretability directly into AI models, not extract it later.

Deep Dive

Researcher Nathan Helm-Burger has placed a significant research bet on a new AI architecture designed to be interpretable from its very foundation. His work on the MONET (Mixture of Monosemantic Experts for Transformers) architecture challenges the conventional approach to AI interpretability, which typically involves applying analysis tools like Sparse Autoencoders (SAEs) to already-trained, opaque models. Instead, MONET builds sparsity and modularity directly into the model's structure from the start, using a hierarchical routing mechanism to create pools of tiny, intended-to-be-monosemantic experts. Early experiments show these expert pools naturally specialize in domains like code, biomedical text, and academic citations without explicit supervision.

While the vision is compelling—'interpretability by construction' as a native property rather than a reconstruction problem—the architecture is still in its experimental phase. Current prototypes are small-scale, trained for under 24 hours on one or two GPUs with less than 1 billion parameters. The critical challenges ahead are achieving true feature-level monosemanticity, ensuring the causal faithfulness of each expert's function, and most importantly, scaling the approach to be performance-competitive at the 8-billion-parameter scale and beyond. The project represents a fundamental wager: that architectural pressure toward sparsity can produce a model where the clean, interpretable decomposition sought by SAE researchers is not just free, but is the core causal mechanism of the model itself.

Key Points

Architecture-first interpretability: MONET builds sparse, modular expert pools into the model design, unlike post-hoc SAE analysis.
Early specialization: Prototypes under 1B parameters show unsupervised domain clustering for code and biomedical text.
Unproven scaling: The critical bet is whether the approach remains efficient and competitive at 8B+ parameters.

Why It Matters

If successful, this could make understanding and auditing advanced AI models fundamentally easier and more reliable.

Read Original Article

A Research Bet on SAE-like Expert Architectures

Why It Matters

Stay Ahead in AI