Research & Papers

Seeking Information with RAG-Assistants: Does Model Size Matter in Human-AI Collaborations?

112 participants found hybrid RAG systems beat standalone models, with no satisfaction difference across 3B-70B sizes.

Deep Dive

A new study from Leiden University investigated whether underlying model size matters in Retrieval-Augmented Generation (RAG) assistants for real-world human-AI collaboration. In a multi-turn information-seeking scenario inspired by workplace compliance and sensitive data handling, 112 participants used RAG assistants built on models of 3B, 8B, and 70B parameters. The study compared human-assisted performance against LLM-only and LLM+RAG baselines, measuring both accuracy and user satisfaction.

Key findings reveal that human-AI collaboration consistently outperformed model-only baselines, with gains that did not depend on model size. Surprisingly, participants reported similar levels of perceived usability and satisfaction across all three model sizes, contradicting the assumption that larger models deliver better user experience. The authors argue that evaluating AI systems in realistic multi-turn interactions with real users—focusing on usability and satisfaction alongside accuracy—provides a more nuanced picture than benchmark performance alone. This suggests organizations deploying RAG assistants can often achieve strong results with smaller, more cost-efficient models.

Key Points
  • Human-AI collaboration with RAG assistants outperformed standalone LLMs by a significant margin, regardless of model size (3B, 8B, or 70B parameters).
  • User satisfaction and perceived usability showed negligible differences across the three model sizes, challenging the notion that bigger models improve the collaborative experience.
  • Study used a realistic multi-turn information-seeking task with 112 participants, emphasizing compliance and sensitive data handling rather than abstract benchmarks.

Why It Matters

Smaller, cheaper models can power effective RAG assistants, reducing costs without sacrificing user satisfaction or performance.