Mixture-of-Experts under Finite-Rate Gating: Communication--Generalization Trade-offs
New paper quantifies the trade-off between communication cost and generalization in mixture-of-experts architectures.
Researchers Ali Khalesi and Mohammad Reza Deylam Salehi published "Mixture-of-Experts under Finite-Rate Gating: Communication–Generalization Trade-offs" (arXiv:2602.15091). They model MoE gating as a stochastic channel with finite information rate, deriving a rate-distortion characterization D(R_g). Their analysis yields capacity-aware limits for communication-constrained MoE systems, with simulations confirming predicted trade-offs between gating rate, expressivity, and generalization in multi-expert models.
Why It Matters
Provides theoretical framework for optimizing large MoE models like GPT-4, balancing performance with computational cost.