Graph-Based Belief Propagation Cuts Token Use 97% for Multi-LLM Aggregation
Combine expert LLMs without extra inference calls—speed boost from minutes to milliseconds.
A new arXiv paper introduces a fundamentally different approach to combining specialized Large Language Models (LLMs). Current ensemble methods like iterative re-prompting or cross-model refinement are computationally expensive and slow, often requiring repeated LLM calls that degrade performance when weaker models contaminate strong ones (anchor corruption). The authors propose representing each LLM as a variable node in a bipartite factor graph, with check nodes that assess consistency across diverse epistemic criteria. A message-passing protocol inspired by error-recovery systems resolves disagreements, while an asymmetric damping mechanism protects high-reliability anchor nodes from being overridden by majority noise. Crucially, the framework operates on output distributions only, requiring zero additional LLM calls during refinement.
Tested on four benchmarks—MMLU, MMLU-Pro, GPQA, and MedMCQA—the method achieves a 97% reduction in token usage and up to a 6x decrease in API calls. Inference time drops from several minutes to mere milliseconds, all while consistently outperforming leading multi-agent baselines. The results suggest that graph-based belief propagation offers a robust, high-speed, and scalable alternative to current multi-LLM systems. The full pipeline and code will be made public, which could enable real-time deployment of diverse expert models without costly re-prompting overhead.
- 97% reduction in token usage and up to 6x decrease in API calls compared to iterative re-prompting methods.
- Inference time reduced from several minutes to milliseconds on MMLU, MMLU-Pro, GPQA, and MedMCQA benchmarks.
- Asymmetric damping mechanism prevents anchor corruption, protecting high-accuracy models from weaker ensemble members.
Why It Matters
Makes multi-LLM ensembles practical for real-time applications, slashing cost and latency dramatically.