Multilevel Determinants of Overweight and Obesity Among U.S. Children Aged 10-17: Comparative Evaluation of Statistical and Machine Learning Approaches Using the 2021 National Survey of Children's Health
Analysis of 18,792 children shows logistic regression performs as well as XGBoost and TabNet for obesity prediction.
A new study by researcher Joyanta Jyoti Mondal provides a comprehensive comparison of statistical and machine learning approaches for predicting childhood obesity, analyzing data from 18,792 U.S. children aged 10-17 from the 2021 National Survey of Children's Health. The research tested seven different models including traditional logistic regression, ensemble methods like random forest and gradient boosting (XGBoost, LightGBM), and deep learning approaches including multilayer perceptron and TabNet. The study examined predictors across behavioral, household, and neighborhood domains including diet, physical activity, sleep patterns, parental stress, socioeconomic conditions, and adverse childhood experiences.
The results revealed that model discrimination scores ranged from 0.66 to 0.79 AUC, with logistic regression, gradient boosting, and MLP showing the most stable balance of discrimination and calibration. Surprisingly, the study found that increased model complexity yielded only modest improvements in recall and F1 scores, with no single model proving uniformly superior across all metrics. More importantly, the research identified persistent performance disparities across race and poverty subgroups that remained consistent regardless of algorithm choice. The findings suggest that public health efforts should focus on improving data quality and equity-focused surveillance rather than pursuing greater algorithmic complexity for population-level obesity prediction.
- Study analyzed 18,792 children using 7 models including logistic regression, XGBoost, and TabNet
- Model performance ranged from 0.66 to 0.79 AUC with limited gains from complex algorithms
- Persistent performance disparities across race and poverty groups unaffected by model choice
Why It Matters
Challenges the assumption that complex AI always outperforms traditional statistics for public health prediction tasks.