Research & Papers

The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability

New metric predicts model controllability with 97% accuracy and detects drift 2x better.

Deep Dive

A new paper from Prashant C. Raju, titled 'The Geometric Canary,' introduces a unified geometric framework for two critical LLM deployment challenges: predicting steerability and detecting internal drift. The core concept, geometric stability, measures the consistency of pairwise distance structures within a model's representations. Supervised variants, which align with specific tasks, predict linear steerability with near-perfect accuracy (Spearman ρ between 0.89 and 0.97) across 35 to 69 embedding models and three NLP tasks, capturing unique variance beyond class separability (partial ρ = 0.62 to 0.76).

Crucially, the research reveals a dissociation: unsupervised stability fails for steering prediction (ρ ≈ 0.10), but excels at drift detection. It measures nearly 2x greater geometric change than CKA during post-training alignment, with up to 5.23x improvement in Llama models. This approach provides earlier warnings in 73% of models and maintains a 6x lower false alarm rate than Procrustes analysis. Together, these metrics form complementary diagnostics: supervised stability for pre-deployment controllability assessment, and unsupervised stability for post-deployment monitoring.

Key Points
  • Supervised geometric stability predicts linear steerability with ρ = 0.89-0.97 across 35-69 embedding models
  • Unsupervised stability detects drift 2x better than CKA, with 5.23x improvement in Llama models
  • Provides earlier drift warnings in 73% of models with 6x lower false alarm rate than Procrustes

Why It Matters

Enables reliable LLM deployment with pre-deployment controllability checks and real-time drift monitoring.