InteractBind benchmark reveals AI models fail at binding site localization
100k protein-ligand pairs tested — models guess binding but can't find where.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
A team led by Zhaohan Meng at the University of Glasgow has released InteractBind, a large-scale dataset containing approximately 100,000 protein-ligand pairs, accompanied by a fine-grained benchmark. The core task goes beyond traditional binary binding prediction or affinity regression: it tests whether models can localize the exact binding sites and identify the specific non-covalent interactions — such as hydrogen bonds, hydrophobic contacts, and van der Waals forces — that drive molecular recognition. The dataset includes protein-residue and ligand-atom interaction maps across six interaction types, plus splits based on binding affinity and protein similarity to ensure realistic generalization assessment.
Evaluation on eight existing sequence-based and interaction-aware models showed a stark gap: models performed well at predicting whether a protein and ligand bind (binary prediction) but failed to accurately localize the binding sites. The variation across interaction types was significant, suggesting that current architectures learn statistical shortcuts rather than physically meaningful representations. The paper is under review for the NeurIPS 2026 Conference Track on Evaluations and Datasets. InteractBind sets a new evaluation paradigm that pushes the community toward developing more interpretable, physically grounded models — a crucial step for real-world drug discovery where understanding the 'why' matters as much as the 'if'.
- Dataset includes ~100,000 protein-ligand pairs with detailed residue-atom interaction maps covering 6 non-covalent interaction types.
- All 8 tested models showed poor binding-site localization despite strong binary binding prediction accuracy.
- Paper submitted to NeurIPS 2026, Track on Evaluations and Datasets; available on arXiv:2605.24045.
Why It Matters
Drug discovery demands models that understand binding mechanisms, not just correlations. InteractBind exposes a critical blind spot.