New ADS metric selects best continual learning models with few samples
A lightweight proxy that predicts logit shift across 175+ architectures...
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Selecting the right pre-trained model for continual learning (CL) is critical to balancing plasticity and stability, but existing methods rely on measuring logit shift—a computationally expensive proxy. A new paper from researchers on arXiv introduces Architecture-driven Shift (ADS), a lightweight metric that estimates logit shift tendencies using only a few data samples. ADS decouples the shift into architecture-dependent and data-dependent components, derived from three mechanistic insights: spectral norm scaling of weight matrix gradients with layer width, optimization path length on new tasks, and asymptotic task conflict in wide networks. This allows ADS to approximate logit shift without the full computational cost, enabling rapid model selection across heterogeneous architectures with varying widths and depths.
Extensive experiments across more than 175 diverse architectures demonstrate a strong monotonic correlation between ADS and actual logit shift (lowest Spearman's r_s = 0.731). Practically, ADS serves as a proxy for expected calibration error—a widely used reliability metric for CL models—and was validated on three datasets across six CL scenarios. The findings suggest that ADS can drastically reduce the cost of model selection for continual learning, making it feasible to evaluate large pools of pre-trained networks without running expensive training loops.
- ADS decouples logit shift into architecture and data dependencies, requiring only few data samples for estimation.
- Tested on 175+ diverse architectures, ADS achieves Spearman's r_s >= 0.731 correlation with actual logit shift.
- Serves as a lightweight proxy for expected calibration error, validated on three datasets across six continual learning scenarios.
Why It Matters
Enables efficient model selection for continual learning without costly computations, saving time and resources.