Research & Papers

MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

New framework boosts clinical prediction accuracy by using LLMs to generate features based on model weaknesses.

Deep Dive

A research team led by Zizheng Zhang has introduced MedFeat, a novel framework that leverages Large Language Models (LLMs) to automate and enhance feature engineering for clinical tabular data prediction. Published on arXiv, the work addresses a critical gap in healthcare AI: while classical models with manual feature engineering often outperform neural networks, existing automated approaches typically perform a broad search over predefined transformations without considering the downstream model's characteristics. MedFeat innovates by incorporating model awareness—prioritizing features that are difficult for the specific target model to learn directly—and uses SHAP (SHapley Additive exPlanations) values to provide explainability and guide the LLM's reasoning process with domain knowledge.

The framework operates in a feedback-driven loop, tracking both successful and failed feature proposals to iteratively improve discovery. In extensive testing across diverse clinical prediction tasks, MedFeat delivered stable performance improvements over various baselines. Crucially, it generated features that were not only predictive but also clinically interpretable and demonstrated strong generalization under distribution shifts—maintaining robustness across different years and from intensive care unit (ICU) cohorts to general hospitalized patients. This combination of model-aware design, explainability, and proven robustness makes MedFeat a significant step toward deployable AI tools in sensitive healthcare environments, where trust and reliability are paramount. The code is slated for release pending dataset agreements.

Key Points
  • Uses LLM reasoning with clinical domain knowledge to generate new predictive features, moving beyond simple transformation searches.
  • Incorporates model awareness and SHAP values for explainability, creating features that target specific model weaknesses.
  • Achieved stable performance gains and demonstrated robustness across clinical settings and time, generalizing from ICU to hospital-wide data.

Why It Matters

Enables more accurate, interpretable, and robust AI models for critical healthcare predictions, accelerating real-world clinical deployment.