Trajectory probe beats MSP by 21 AURC for LLM uncertainty
A new method reads layer-wise geometric features to expose hidden miscalibration in LLMs.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Deep Dive
A sparse linear probe that extracts 11 scale-invariant geometric features from per-layer MLP updates in language models outperforms maximum softmax probability (MSP) under selective abstention by up to 21 AURC points. Because each feature has a closed-form geometric meaning, the method reveals where and how errors accumulate across depth—such as layers that commit prematurely or reverse earlier evidence.
Key Points
- Extracts 11 scale-invariant geometric features from per-layer MLP update trajectories.
- Sparse linear probe outperforms MSP by up to 21 AURC points for selective abstention.
- Interpretable coefficients reveal where errors arise across layers (e.g., premature commitment, contradictions).
Why It Matters
Better calibrated LLM uncertainty could reduce false confidence in critical applications like healthcare or finance.