New AI benchmark exposes 8x failure rate in drug molecular predictions
Existing models surge errors by 5.9x on extreme out-of-distribution tests...
A new study from researchers at multiple institutions (Zhuohao Lin, Kun Li, Jiameng Chen, et al.) tackles a critical bottleneck in AI-driven drug discovery: molecular property prediction under extreme out-of-distribution (OOD) scenarios. The team finds that current scaffold-splitting protocols fail to stop microscopic semantic overlap, letting models cheat via shortcut learning. To fix this, they introduce SCOPE-BENCH, a benchmark built on cluster-level partitioning in an explicit physicochemical descriptor space. Tests reveal that state-of-the-art 3D molecular models see prediction errors surge by up to 8.0x (mean 5.9x) on SCOPE-BENCH, exposing how poorly they generalize to truly novel molecules.
The paper also presents POMA (Policy Optimization for Multi-Source Adaptation), a framework that treats knowledge transfer as a retrieve-compose-adapt pipeline. POMA first identifies labeled source scaffolds structurally close to the unlabeled target (proxy targets), then uses reinforcement learning to select the optimal source subset from an exponential candidate pool, and finally performs dual-scale domain adaptation at both macroscopic topological and microscopic pharmacophore scales. On diverse backbone architectures, POMA achieves up to an 11.2% reduction in mean absolute error with an average relative improvement of 6.2%. The code is publicly available, giving the drug discovery community a more rigorous test and a proven method for robust molecular prediction.
- SCOPE-BENCH uses cluster-level partitioning in physicochemical descriptor space to eliminate shortcut learning, causing SOTA model errors to surge 5.9x on average.
- POMA combines a retrieve-compose-adapt pipeline with RL-based optimal source selection and dual-scale domain adaptation (topological + pharmacophore).
- POMA reduces mean absolute error by up to 11.2% (avg 6.2%) across multiple backbone architectures, with code open-sourced.
Why It Matters
New benchmark and adaptation method could dramatically improve AI reliability for discovering novel drug molecules.