Research & Papers

Residual Coupling lets frozen LLMs collaborate without retraining

Adding lightweight linear bridges cuts perplexity by 80% across frozen models.

Deep Dive

Residual Coupling (RC) introduces a horizontal scaling paradigm for large language models. Instead of modifying base weights, RC uses small learned linear bridge projections to connect frozen models in parallel. These bridges read hidden states from one model and inject additive updates into the residual stream of another at intermediate layers. In bilateral setups, simultaneous return bridges form a feedback loop that stabilizes both streams. By constraining bridges to purely linear maps, the system prevents overfitting because they can only map existing geometric relationships between frozen representation spaces. This architecture decouples memorization from relational alignment: base models act as memorizers while lightweight bridges handle cross-domain generalization. Crucially, keeping base weights frozen eliminates catastrophic forgetting and maintains operational closure.

Evaluated against Mixture-of-Experts (MoE) routing across the same frozen models, RC delivers dramatic improvements. On a medical benchmark, RC reduces perplexity to 11.02 compared to MoE's 56.80 and a frozen baseline of 57.08—an 80.7% reduction. On TruthfulQA Health (MC1), RC improves accuracy by 9.1 percentage points over baseline. Independent models have uncorrelated hallucinations, allowing bridge gates to amplify consistent cross-model updates while suppressing individual errors. In a coding test with mismatched tokenizers, RC achieves 5.91 perplexity versus MoE's 878 and a baseline of 7 million. Latency remains bounded by the slowest single model, and specialists can be added or removed without retraining. This framework could replace multi-turn text prompting in agentic workflows with a single parallel forward pass, enabling models to run on separate nodes without a central bottleneck.

Key Points
  • RC uses learned linear bridge projections to connect frozen LLMs in parallel without modifying base weights.
  • On medical tasks, RC reduces perplexity by 80.7% (11.02 vs MoE's 56.80) and boosts TruthfulQA accuracy by 9.1 points.
  • Enables horizontal scaling, eliminates catastrophic forgetting, and could replace multi-turn agent prompting with a single parallel pass.

Why It Matters

Enables efficient multi-model collaboration without retraining, opening new possibilities for edge and specialized AI systems.