Research & Papers

Trajectory probe beats MSP by 21 AURC for LLM uncertainty

A new method reads layer-wise geometric features to expose hidden miscalibration in LLMs.

Deep Dive

A sparse linear probe that extracts 11 scale-invariant geometric features from per-layer MLP updates in language models outperforms maximum softmax probability (MSP) under selective abstention by up to 21 AURC points. Because each feature has a closed-form geometric meaning, the method reveals where and how errors accumulate across depth—such as layers that commit prematurely or reverse earlier evidence.

Key Points
  • Extracts 11 scale-invariant geometric features from per-layer MLP update trajectories.
  • Sparse linear probe outperforms MSP by up to 21 AURC points for selective abstention.
  • Interpretable coefficients reveal where errors arise across layers (e.g., premature commitment, contradictions).

Why It Matters

Better calibrated LLM uncertainty could reduce false confidence in critical applications like healthcare or finance.