Imbalanced Classification under Capacity Constraints
Fixes rare disease detection when you can only test a few samples.
In real-world scenarios like rare disease screening or fraud detection, identifying a potential positive instance is only the first step—the costly follow-up action (e.g., medical imaging or transaction review) is limited by operational capacity. Traditional imbalanced learning methods like SMOTE address the data imbalance but do not account for the hard constraint on how many cases can be further investigated.
Fraiman and Fraiman tackle this gap by introducing a classification framework that explicitly controls the rate of positive predictions to stay within a user-defined budget. Their method maximizes true positive detection while ensuring no more than K instances out of N are flagged as positive. The approach works with any standard classifier, extends naturally to online settings where decisions must be made in real time, and outperforms resampling baselines that ignore selection rate constraints.
- Proposes a capacity-constrained classifier that limits positive predictions to a user-defined proportion (e.g., 5% of all cases).
- Outperforms SMOTE and other resampling techniques by directly controlling selection rate, not just balancing data.
- Supports online, sequential decision-making, critical for real-time applications like fraud monitoring.
Why It Matters
Brings operational reality into ML—your model must respect budget limits, not just maximize accuracy.