Research & Papers

Linear Models, Variable Selection, Artificial Intelligence

Pretrained ANN handles up to 100 predictors and beats traditional methods...

Deep Dive

Variable selection has long been a challenge in linear regression, with classic methods like forward/backward selection, AIC, BIC, and LASSO each having limitations. Now, researchers from Virginia Commonwealth University and Qatar University propose a novel approach: training an artificial neural network (ANN) to directly assess variable significance based on ordinary least squares (OLS) estimates. In extensive simulations across varying sample sizes and variances, their ANN consistently outperformed traditional techniques in selecting the correct set of predictors.

The team provides a pretrained ANN on GitHub that can handle up to 100 predictor variables, along with the original WHO Life Expectancy dataset used for illustration. This work bridges classical statistics and modern AI, offering professionals a ready-to-use tool for more accurate model building. By automating the tricky trade-off between bias and variance, it promises to streamline workflows in fields from economics to bioinformatics.

Key Points
  • ANN trained on OLS estimates beats Forward, Backward, AIC, BIC, and LASSO in simulation studies
  • Pretrained model handles up to 100 predictor variables and is available on GitHub
  • Demonstrated on WHO Life Expectancy dataset, bridging classical statistics with modern AI

Why It Matters

Automates variable selection with higher accuracy, saving data scientists time and reducing model misspecification