New AI method uses LLMs to slash bias from missing data by 75-83%
Researchers' framework uses 'weak shadow variables' from pretrained models to tighten statistical bounds on biased feedback.
Researchers Hongyu Chen, David Simchi-Levi, and Ruoxuan Xiong developed a partial identification framework that uses predictions from pretrained models (like LLMs) as 'weak shadow variables' to address missing-not-at-random (MNAR) data. Their method formulates the problem as linear programs, incorporating model outputs as constraints. In experiments, it reduced identification intervals by 75-83% while maintaining valid statistical coverage, offering a robust alternative to traditional, assumption-heavy methods for analyzing biased user feedback.
Why It Matters
Provides a more reliable way for platforms and researchers to analyze inherently biased user feedback, like reviews or surveys, using existing AI models.