MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning
New quantization and expert mixture method achieves SOTA accuracy on vision-language tasks...
Parameter-efficient transfer learning (PETL) adapts large foundation models to new tasks with minimal trainable parameters, but still suffers from high memory overhead due to gradient backpropagation. Memory-efficient transfer learning (METL) avoids that by using lightweight side networks and bypassing backbone gradients, but the tight memory budget limits the capacity of those side networks, hurting performance. The new MP-ISMoE framework tackles both issues: it first applies a Gaussian Noise Perturbed Iterative Quantization (GNP-IQ) scheme to quantize weights into lower-bit representations, reducing memory consumption. The memory saved is then reinvested into an Interactive Side Mixture-of-Experts (ISMoE) module, which scales up side network capacity without exceeding the original memory budget.
Unlike traditional mixture-of-experts, ISMoE selects experts by interacting with salient features from the frozen backbone, which suppresses catastrophic forgetting and boosts accuracy. Extensive experiments on diverse vision-language and language-only tasks show that MP-ISMoE significantly improves accuracy over state-of-the-art METL approaches, while maintaining comparable parameter and memory efficiency. The paper has been accepted at AAAI 2026.
- GNP-IQ quantizes model weights to lower bits using iterative noise perturbation, reducing memory for gradients.
- ISMoE scales up side networks by selecting experts based on features from the frozen backbone, preventing knowledge loss.
- MP-ISMoE outperforms prior memory-efficient transfer learning methods on both vision-language and language-only benchmarks.
Why It Matters
Enables fine-tuning large AI models on consumer hardware without sacrificing accuracy, democratizing advanced LLM adaptation.