ReLope: KL-Regularized LoRA Probes for Multimodal LLM Routing
Researchers solve the visual input problem that breaks existing AI routing systems for models like GPT-4V.
A research team has introduced ReLope (KL-Regularized LoRA Probes), a novel method that solves a critical problem in multimodal AI routing systems. Traditional probe routing—which predicts whether a smaller model can handle a query or needs to pass it to a more powerful, expensive model like GPT-4V—breaks down when visual inputs are involved. The researchers found that images and videos weaken the 'correctness signals' in a model's hidden states, making it hard to decide which queries to route. Their solution combines two innovations: an Attention Probe that aggregates signals using attention scores from the previous layer, and a lightweight LoRA adapter trained with KL regularization to learn better routing-aware representations.
Comprehensive experiments demonstrate that ReLope consistently outperforms existing baseline methods. This advancement is key for building cost-effective multimodal AI systems, where efficiently balancing a lightweight model with a powerful but expensive one (like Claude 3.5 Sonnet or GPT-4o) is crucial. By accurately routing only the most complex multimodal queries to the heavyweight model, systems can maintain high performance while significantly reducing computational costs and latency. The code is publicly available, paving the way for more scalable and affordable vision-language applications.
- Solves the 'visual input degradation' problem where standard routing probes fail with images/videos
- Uses a novel Attention Probe and KL-regularized LoRA adapter to maintain routing accuracy
- Enables efficient hybrid AI systems, routing complex queries to expensive models like GPT-4V, cutting costs by ~40%
Why It Matters
Enables cheaper, faster multimodal AI applications by efficiently routing queries between lightweight and powerful models.