PromptNCE lets LLMs estimate word relationships with 0.82 accuracy
No training needed — just a clever prompt with an 'OTHER' category.
Estimating how strongly two words or concepts relate (pointwise mutual information, or PMI) usually requires training a dedicated neural network per task — a barrier for low-data scenarios. In a new arXiv paper, Stanford researchers Juliette Woodrow and Chris Piech propose PromptNCE, which instead asks large language models to estimate PMI zero-shot using only carefully crafted prompts and the model’s own probabilities. The key innovation: adding an explicit "OTHER" option to the contrastive candidate set, which theoretically ensures the model estimates the true conditional probability P(y|x) rather than just a relative ranking. On three public datasets with human-derived ground-truth PMI, PromptNCE achieved Spearman correlations up to 0.82, outperforming all other zero-shot prompting methods.
Beyond the benchmark, the team demonstrates a practical use case in computer science education: scoring student knowledge summaries without needing labeled training data. This opens the door to applying PromptNCE in any setting where you need to measure semantic relatedness but lack large labeled corpora. The method works with any off-the-shelf LLM, making it immediately deployable. While not as accurate as fully trained critics, PromptNCE’s zero-shot capability and theoretical grounding make it a powerful tool for researchers and engineers working in data-sparse domains.
- PromptNCE adds an explicit 'OTHER' category to contrastive prompts, enabling true conditional probability estimation instead of just ranking.
- Achieves Spearman correlation up to 0.82 with human-derived PMI on three benchmarks, best among zero-shot methods.
- Demonstrated in CS education to score student summaries without any training data — a low-resource win.
Why It Matters
Zero-shot PMI estimation unlocks semantic analysis in low-data domains like education, healthcare, and niche NLP tasks.