Research & Papers

Awakening Dormant Experts:Counterfactual Routing to Mitigate MoE Hallucinations

arXiv cs.LG April 17, 2026

⚡A training-free inference framework boosts factual accuracy by 3.1% without increasing compute costs.

Deep Dive

A research team from multiple institutions has introduced Counterfactual Routing (CoR), a novel, training-free inference framework designed to tackle a core weakness in Sparse Mixture-of-Experts (MoE) models: their tendency to hallucinate on long-tail or rare factual knowledge. The paper identifies that the standard Top-k routing mechanism, which selects a fixed number of 'expert' sub-networks for each input, creates a bias toward high-frequency patterns. This leaves specialized experts with critical, niche knowledge under-prioritized or 'dormant,' directly contributing to factual errors.

CoR addresses this by using a layer-wise perturbation analysis and a new Counterfactual Expert Impact (CEI) metric. During inference, the system virtually 'ablates' or removes experts to measure their causal importance, then dynamically reallocates computational resources from syntax-focused layers to knowledge-intensive ones. Crucially, it maintains the same total number of activated experts (the compute budget), making it a zero-cost upgrade. Extensive testing on TruthfulQA, FACTOR, and TriviaQA showed CoR improves average factual accuracy by 3.1%, establishing a superior performance-efficiency trade-off compared to simply scaling the model size.

Key Points

Identifies 'dormant expert' problem in MoE models where static Top-k routing misses rare factual knowledge.
Proposes Counterfactual Routing (CoR), a training-free inference method using virtual ablation to measure expert impact.
Boosts factual accuracy by 3.1% on key benchmarks without increasing the computational cost of inference.

Why It Matters

Enables more reliable, factual AI from large models like GPT-4 and Mixtral without expensive retraining or more compute.

Read Original Article

Awakening Dormant Experts:Counterfactual Routing to Mitigate MoE Hallucinations

Why It Matters

Stay Ahead in AI