Research & Papers

PromptNCE lets LLMs estimate word relationships with 0.82 accuracy

No training needed — just a clever prompt with an 'OTHER' category.

Deep Dive

Estimating how strongly two words or concepts relate (pointwise mutual information, or PMI) usually requires training a dedicated neural network per task — a barrier for low-data scenarios. In a new arXiv paper, Stanford researchers Juliette Woodrow and Chris Piech propose PromptNCE, which instead asks large language models to estimate PMI zero-shot using only carefully crafted prompts and the model’s own probabilities. The key innovation: adding an explicit "OTHER" option to the contrastive candidate set, which theoretically ensures the model estimates the true conditional probability P(y|x) rather than just a relative ranking. On three public datasets with human-derived ground-truth PMI, PromptNCE achieved Spearman correlations up to 0.82, outperforming all other zero-shot prompting methods.

Beyond the benchmark, the team demonstrates a practical use case in computer science education: scoring student knowledge summaries without needing labeled training data. This opens the door to applying PromptNCE in any setting where you need to measure semantic relatedness but lack large labeled corpora. The method works with any off-the-shelf LLM, making it immediately deployable. While not as accurate as fully trained critics, PromptNCE’s zero-shot capability and theoretical grounding make it a powerful tool for researchers and engineers working in data-sparse domains.

Key Points
  • PromptNCE adds an explicit 'OTHER' category to contrastive prompts, enabling true conditional probability estimation instead of just ranking.
  • Achieves Spearman correlation up to 0.82 with human-derived PMI on three benchmarks, best among zero-shot methods.
  • Demonstrated in CS education to score student summaries without any training data — a low-resource win.

Why It Matters

Zero-shot PMI estimation unlocks semantic analysis in low-data domains like education, healthcare, and niche NLP tasks.