Decoupling Scores and Text: The Politeness Principle in Peer Review
Analysis of 30,000 ICLR papers reveals polite feedback masks true rejection signals, confusing authors.
A new study by researcher Yingxuan Wen, titled 'Decoupling Scores and Text: The Politeness Principle in Peer Review,' provides a data-driven look at a common frustration in academia. By constructing a dataset of over 30,000 submissions to the ICLR conference from 2021-2025, the research directly compared the predictive power of numerical review scores versus the text of the reviews themselves. The results were stark: models using only scores achieved 91% accuracy in predicting paper acceptance, while models using the review text—even enhanced by large language models—lagged significantly behind at just 81% accuracy.
This 10-percentage-point performance gap is explained by what the author terms the 'Politeness Principle.' Analysis of review sentiment revealed that even for papers that were ultimately rejected, reviewers used more positive sentiment words than negative ones. This polite language masks the true rejection signal, making it difficult for authors to accurately gauge their paper's fate from the text alone. The study also found that in the 9% of cases where score-based models failed, the score distributions showed high kurtosis and negative skewness, indicating that a single decisive low score can sink a paper even when the average is borderline.
The findings highlight a critical disconnect in the peer review process. While authors naturally scrutinize every word of feedback for clues, the textual component is a less reliable indicator of outcome than the cold, hard numbers. This research quantifies a long-suspected issue, suggesting that the community's norms of constructive criticism may inadvertently create confusion and false hope for submitters.
- Score-based models predicted ICLR paper acceptance with 91% accuracy, outperforming text-based models by 10 percentage points.
- The 'Politeness Principle' explains the gap: reviews for rejected papers still contain more positive than negative sentiment words.
- A single low score often drives rejection, as seen in skewed score distributions for borderline papers.
Why It Matters
For researchers, it quantifies why polite feedback can be misleading and underscores that review scores are the true north star for acceptance.