Research & Papers

Argonne Researchers Introduce Probabilistic Attribution for LLMs

Bayes rule inversion tracks which tokens drive LLM outputs.

Deep Dive

A probabilistic token attribution method for LLMs uses Bayes rule to invert next-token log-probabilities, yielding an attribution score based on the ratio of conditional probabilities with and without a token. Evaluated across 8 models and 7 prompts, the approach improves interpretability by identifying token sensitivity, anomalies, and model stability issues, guiding users toward uncertain or unstable generation parts.

Key Points
  • Uses Bayes rule to compute token attribution scores from next-token log-probabilities.
  • Evaluated across 8 LLMs and 7 prompts to assess sensitivity, stability, and convergence.
  • Provides a model-agnostic way to identify unstable or anomalous token sequences in generations.

Why It Matters

Helps developers and users pinpoint unreliable tokens in LLM outputs, improving interpretability and trust.