[D] ICML 2026: Policy A vs Policy B impact on scores discussion
Anecdotal evidence suggests papers reviewed without AI assistance received harsher scores than those with LLM help.
A viral discussion from ICML 2026 has surfaced a potential flaw in the conference's experimental dual-track peer review system. An anonymous researcher reports that papers reviewed under 'Policy A'—where human reviewers were prohibited from using LLMs like GPT-4 or Claude—appear to have received harsher scores on average than those under 'Policy B,' which allowed limited AI assistance. This observation, based on a small sample from their review batch, Reddit, X, and discussions with professors and area chairs, suggests a systematic scoring disparity. The researcher speculates that AI-assisted reviews under Policy B may lead to a more lenient tone, broader background knowledge, cleaner text, and a higher tendency to give authors the benefit of the doubt.
To gather broader evidence, the researcher has launched an anonymous Google Forms poll, asking the community to share their paper's policy, score vector, and impressions of review harshness. While acknowledging the data will be noisy and self-selected, the goal is to create a rough community snapshot to see if score distributions differ by policy. The core concern is fairness: researchers who meticulously conducted reviews under the stricter policy may be penalized if their papers were judged more harshly by unaided reviewers. This incident highlights the growing, unstandardized role of AI in academic peer review and raises critical questions about how to ensure equitable evaluation as these tools become ubiquitous.
- ICML 2026 tested two LLM review policies: a strict 'no-LLM' Policy A and a permissive 'LLM-assisted' Policy B.
- Anecdotal reports from a researcher suggest Policy A papers received harsher average scores, potentially due to less polished, more critical human-only reviews.
- An informal poll has been launched to collect community data on scores and review styles to investigate a potential systematic bias.
Why It Matters
This debate exposes critical fairness and standardization issues as AI tools become embedded in high-stakes academic peer review processes.