Researcher's 270K-parameter ML model predicts melting points with R² 0.64
A tiny 1.3MB deep learning model rivals a 1.23GB random forest on chemical predictions
A researcher on Reddit has developed a compact deep learning model for Quantitative Structure-Property Relationship (QSPR) analysis, specifically predicting melting points of chemical compounds from the Jean-Claude Bradley Open Melting Point Dataset. By engineering 26 topological indices as features, they initially trained a random forest model that achieved a test R² score of 0.66—respectable for the domain—but the model file weighed 1.23GB, making deployment impractical.
To address the size issue, the researcher spent two weeks building a custom neural network in PyTorch with 270,000 learnable parameters, resulting in a model size of just 1.3–1.4MB. Despite the drastic reduction, the deep learning model achieved an R² of 0.6399, a mean absolute error (MAE) of 41.25 Kelvin, and a mean absolute percentage error (MAPE) of 11.69%. The trade-off between accuracy and footprint is now central to their decision: should they publish these results, or continue optimizing? The community input will likely weigh the value of lightweight, deployable models in cheminformatics against the opportunity for further accuracy gains.
- Deep learning model uses only 270,000 parameters (1.3MB) vs random forest's 1.23GB
- Achieves R² of 0.6399 and MAE of 41.25 K on melting point prediction
- Researcher asks whether to publish or continue improving accuracy
Why It Matters
Enables lightweight, deployable ML for chemical property prediction without sacrificing much accuracy.