ADS decouples logit shift into architecture and data dependencies, requiring only few data samples for estimation?

ADS decouples logit shift into architecture and data dependencies, requiring only few data samples for estimation.

Tested on 175+ diverse architectures, ADS achieves Spearman's r_s >= 0.731 correlation with actual logit shift?

Tested on 175+ diverse architectures, ADS achieves Spearman's r_s >= 0.731 correlation with actual logit shift.

Serves as a lightweight proxy for expected calibration error, validated on three datasets across six continual learning scenarios?

Serves as a lightweight proxy for expected calibration error, validated on three datasets across six continual learning scenarios.

Research & Papers

New ADS metric selects best continual learning models with few samples

arXiv cs.LG May 28, 2026

⚡A lightweight proxy that predicts logit shift across 175+ architectures...

Deep Dive

Selecting the right pre-trained model for continual learning (CL) is critical to balancing plasticity and stability, but existing methods rely on measuring logit shift—a computationally expensive proxy. A new paper from researchers on arXiv introduces Architecture-driven Shift (ADS), a lightweight metric that estimates logit shift tendencies using only a few data samples. ADS decouples the shift into architecture-dependent and data-dependent components, derived from three mechanistic insights: spectral norm scaling of weight matrix gradients with layer width, optimization path length on new tasks, and asymptotic task conflict in wide networks. This allows ADS to approximate logit shift without the full computational cost, enabling rapid model selection across heterogeneous architectures with varying widths and depths.

Extensive experiments across more than 175 diverse architectures demonstrate a strong monotonic correlation between ADS and actual logit shift (lowest Spearman's r_s = 0.731). Practically, ADS serves as a proxy for expected calibration error—a widely used reliability metric for CL models—and was validated on three datasets across six CL scenarios. The findings suggest that ADS can drastically reduce the cost of model selection for continual learning, making it feasible to evaluate large pools of pre-trained networks without running expensive training loops.

Key Points

ADS decouples logit shift into architecture and data dependencies, requiring only few data samples for estimation.
Tested on 175+ diverse architectures, ADS achieves Spearman's r_s >= 0.731 correlation with actual logit shift.
Serves as a lightweight proxy for expected calibration error, validated on three datasets across six continual learning scenarios.

Why It Matters

Enables efficient model selection for continual learning without costly computations, saving time and resources.

Read Original Article

New ADS metric selects best continual learning models with few samples

Why It Matters

Related Articles

🚀 Stay Ahead in AI