Where Should LoRA Go? Component-Type Placement in Hybrid Language Models
Placing adapters on attention layers yields 5-10x fewer parameters with better results.
A new paper from researchers at the University of Valencia investigates where to apply LoRA (Low-Rank Adaptation) adapters in hybrid language models that combine attention mechanisms with recurrent components. The study, published on arXiv, tests two architectures: Qwen3.5-0.8B (sequential hybrid using GatedDeltaNet + softmax attention) and Falcon-H1-0.5B (parallel hybrid using Mamba-2 SSM + attention). Fine-tuned across three domains and evaluated on five benchmarks, the results reveal that placing LoRA adapters exclusively on the attention pathway consistently outperforms full-model adaptation while using 5-10x fewer trainable parameters.
The findings highlight a critical dependency on hybrid topology. In sequential hybrids, adapting the recurrent backbone is actively destructive, causing a 14.8 percentage point drop on GSM8K math reasoning. Conversely, in parallel hybrids, adapting the recurrent component yields an 8.6 pp improvement. The study also documents a transfer asymmetry: parallel hybrids exhibit positive cross-task transfer, while sequential hybrids suffer from catastrophic forgetting. These results establish that component-type placement is a necessary design dimension for efficient fine-tuning of hybrid architectures, with practical implications for deploying smaller, more efficient models.
- Attention-only LoRA placement outperforms full-model fine-tuning with 5-10x fewer parameters
- Adapting recurrent components in sequential hybrids causes -14.8 pp on GSM8K, but +8.6 pp in parallel hybrids
- Parallel hybrids show positive cross-task transfer; sequential hybrids suffer catastrophic forgetting
Why It Matters
This research provides a practical guide for efficiently fine-tuning hybrid models, potentially reducing compute costs while improving performance.