Robust Explanations for User Trust in Enterprise NLP Systems
A new framework reveals decoder LLMs produce far more robust explanations than older models under real-world noise.
A team of researchers has published a significant new study titled 'Robust Explanations for User Trust in Enterprise NLP Systems' on arXiv. The work addresses a critical gap in enterprise AI: how to validate the robustness of explanations from black-box AI models, like those accessed via API, before deployment. The researchers propose a unified evaluation framework that measures explanation stability using a 'top-token flip rate' metric. This tests how often the key words an AI highlights to explain its decision change when the input text is perturbed with realistic noise like word swaps, deletions, or back-translation.
Using this protocol, the team conducted a massive systematic comparison across three benchmark datasets and six models, spanning both encoder families (BERT, RoBERTa) and decoder LLMs (Qwen 7B/14B, Llama 8B/70B), totaling 64,800 test cases. The key finding is that modern decoder-based LLMs produce substantially more stable explanations than traditional encoder classifiers, with an average of 73% lower flip rates. Furthermore, explanation robustness improved with model scale, showing a 44% gain from a 7B to a 70B parameter model like Llama 70B.
Finally, the study connects these robustness gains to practical deployment concerns by relating them to inference cost. This yields a tangible cost-robustness tradeoff curve that enterprise teams can use to select the right model and explanation method for compliance-sensitive applications, such as finance or healthcare, where user trust and auditability are paramount.
- Decoder LLMs (like Llama 70B) showed 73% more stable token-level explanations than encoder models (BERT) under text perturbations.
- Explanation robustness improves with model scale, with a 44% gain moving from a 7B to a 70B parameter model.
- The research provides a practical cost-robustness tradeoff curve to guide model selection for high-stakes enterprise deployments.
Why It Matters
Enables enterprises to deploy more trustworthy, explainable AI in regulated industries by quantifying which models resist user-input noise.