Research & Papers

Researcher's 270K-parameter ML model predicts melting points with R² 0.64

A tiny 1.3MB deep learning model rivals a 1.23GB random forest on chemical predictions

Deep Dive

A researcher on Reddit has developed a compact deep learning model for Quantitative Structure-Property Relationship (QSPR) analysis, specifically predicting melting points of chemical compounds from the Jean-Claude Bradley Open Melting Point Dataset. By engineering 26 topological indices as features, they initially trained a random forest model that achieved a test R² score of 0.66—respectable for the domain—but the model file weighed 1.23GB, making deployment impractical.

To address the size issue, the researcher spent two weeks building a custom neural network in PyTorch with 270,000 learnable parameters, resulting in a model size of just 1.3–1.4MB. Despite the drastic reduction, the deep learning model achieved an R² of 0.6399, a mean absolute error (MAE) of 41.25 Kelvin, and a mean absolute percentage error (MAPE) of 11.69%. The trade-off between accuracy and footprint is now central to their decision: should they publish these results, or continue optimizing? The community input will likely weigh the value of lightweight, deployable models in cheminformatics against the opportunity for further accuracy gains.

Key Points
  • Deep learning model uses only 270,000 parameters (1.3MB) vs random forest's 1.23GB
  • Achieves R² of 0.6399 and MAE of 41.25 K on melting point prediction
  • Researcher asks whether to publish or continue improving accuracy

Why It Matters

Enables lightweight, deployable ML for chemical property prediction without sacrificing much accuracy.