Research & Papers

LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning

New method achieves expert specialization without separate adapters, reducing parameters while maintaining performance.

Deep Dive

A research team led by Md Kowsher and six other authors has introduced LiME (Lightweight Mixture of Experts), a novel approach that reimagines how mixture-of-experts architectures handle multimodal multi-task learning. Traditional MoE-PEFT methods require separate parameter-efficient fine-tuning adapters for each expert, causing trainable parameters to scale linearly with expert count and limiting architectural flexibility. LiME breaks this pattern by using a single shared PEFT module and modulating its output with lightweight expert vectors, dramatically reducing parameter requirements while generalizing to any PEFT method.

LiME introduces several technical innovations including zero-parameter routing that leverages existing frozen and adapted representations, eliminating the learned router parameters typically required per layer. The method also incorporates n-gram windowed routing and adaptive expert selection (Auto Top-K) based on routing confidence. The researchers provide theoretical proofs showing that more experts preserve more task-relevant information and that modulation approximates full expert-specific PEFT with bounded error.

Experiments on the challenging MMT-47 benchmark, which includes 47 tasks spanning text, image, and video modalities, demonstrate LiME's effectiveness. The approach achieves competitive or superior performance while using up to 4x fewer trainable parameters and enabling up to 29% faster training compared to traditional MoE-PEFT baselines. This represents a significant efficiency breakthrough for adapting large multimodal models to diverse downstream tasks.

Key Points
  • Uses single shared PEFT module with lightweight expert modulation instead of separate adapters per expert
  • Achieves up to 4x reduction in trainable parameters while maintaining performance on 47-task MMT benchmark
  • Introduces zero-parameter routing and adaptive expert selection, enabling up to 29% faster training

Why It Matters

Enables more efficient adaptation of large multimodal models to diverse real-world applications with significantly reduced computational costs.