Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models
Researchers' framework uses 'weak shadow variables' from pretrained models to tighten statistical bounds on biased feedback.
Researchers Hongyu Chen, David Simchi-Levi, and Ruoxuan Xiong developed a partial identification framework that uses predictions from pretrained models (like LLMs) as 'weak shadow variables' to address missing-not-at-random (MNAR) data. Their method formulates the problem as linear programs, incorporating model outputs as constraints. In experiments, it reduced identification intervals by 75-83% while maintaining valid statistical coverage, offering a robust alternative to traditional, assumption-heavy methods for analyzing biased user feedback.
Why It Matters
Provides a more reliable way for platforms and researchers to analyze inherently biased user feedback, like reviews or surveys, using existing AI models.