Distill and Align Decomposition for Enhanced Claim Verification
An 8B parameter model achieves 71.75% macro-F1 by jointly optimizing decomposition and verification.
A research team from institutions including Carnegie Mellon University and JPMorgan Chase has published a new AI framework that significantly improves automated fact-checking. Their paper, "Distill and Align Decomposition for Enhanced Claim Verification," addresses a core challenge: existing methods for verifying complex claims often decompose them poorly, leading to inaccurate verification. The team's novel solution uses reinforcement learning (RL) with a technique called Group Relative Policy Optimization (GRPO) to train a language model to not only break a claim into better subclaims but to ensure those subclaims are optimally structured for a downstream verification model. This joint optimization of decomposition quality and "verifier alignment" is a key innovation.
The technical approach integrates three components: structured sequential reasoning, supervised fine-tuning on high-quality teacher-distilled examples, and a multi-objective reward function. The reward balances format compliance, how well the decomposition aligns with the verifier's needs, and the intrinsic quality of the subclaims. The result is a highly efficient 8-billion-parameter model that achieved a macro-F1 score of 71.75% in evaluations, outperforming standard prompt-based methods by 1.99 to 6.24 percentage points and existing RL techniques by 5.84 points. Human evaluation confirmed the subclaims were high-quality. This work demonstrates a path for smaller, cheaper models to match or exceed the fact-checking performance of much larger models, making robust verification more accessible and scalable for real-world applications like content moderation and news analysis.
- Uses Group Relative Policy Optimization (GRPO) to jointly train decomposition and verification alignment in a single RL framework.
- Achieves 71.75% macro-F1 score, beating prompt-based methods by up to 6.24% and prior RL methods by 5.84%.
- Enables a relatively small 8B parameter model to reach state-of-the-art verification performance, improving efficiency.
Why It Matters
Makes high-accuracy, automated fact-checking more efficient and scalable for platforms combating misinformation.