Indic-TunedLens: Interpreting Multilingual Models in Indian Languages
New framework decodes AI 'thoughts' in low-resource languages, beating SOTA methods on MMLU.
Researchers Mihir Panchal, Deeksha Varshney, Mamta, and Asif Ekbal built Indic-TunedLens, a novel interpretability framework for multilingual LLMs. It uses learned affine transformations to align hidden states for specific target languages, enabling more faithful decoding of model representations. Evaluated on 10 Indian languages using the MMLU benchmark, it significantly outperforms standard methods like the Logit Lens, especially for morphologically rich, low-resource languages.
Why It Matters
Provides crucial transparency for deploying AI in linguistically diverse regions, moving beyond English-centric interpretability tools.