Sparse Autoencoders for Single-Cell Models
Applying AI safety tools to Geneformer and scGPT reveals hidden cellular knowledge, creating a new testbed for interpretability.
In a viral post, researcher Ihor Kendiukhov challenges the AI biology field, arguing that single-cell foundation models like Geneformer V2-316M and scGPT are being evaluated incorrectly and contain far more knowledge than their surface-level outputs suggest. He posits that treating these models like LLMs—judging them by benchmark performance on tasks like cell type annotation—is akin to evaluating a scientist with a multiple-choice exam. The real, compressed biological knowledge learned from tens of millions of cells exists internally in superimposed activations, never directly appearing in a standard output.
Kendiukhov's solution is to import tools from the AI safety community, specifically sparse autoencoders (SAEs). He trained SAEs on the residual stream activations of every layer in both Geneformer (18 layers) and scGPT (12 layers) to decompose their dense, superimposed information into sparse, interpretable features. This approach, detailed in his paper "The SAE Atlas," successfully maps what the models know and how they compute. Crucially, biology provides the ground truth—decades of molecular data and curated pathway databases—that language lacks, making these biological models a superior, real-world testbed for validating mechanistic interpretability methods like causal circuit tracing and feature ablation.
- Biological FMs like Geneformer & scGPT are underestimated by LLM-style evaluation metrics.
- Sparse Autoencoders (SAEs) from AI safety can decompose their activations into interpretable biological features.
- Biology provides curated pathway databases and perturbation screens, offering ground truth for validating interpretability methods.
Why It Matters
Unlocks hidden knowledge in AI biology models and provides a validated testbed for improving AI interpretability and safety.