Research & Papers

ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models

arXiv cs.CV March 23, 2026

⚡New benchmark shows AI models fail to request user help, a key collaborative skill, even with hints.

Deep Dive

A research team from the University of Trento and University of Montpellier has published a new benchmark called ProactiveBench, designed to measure a critical but overlooked skill in multimodal large language models (MLLMs): proactiveness. The core question is whether AI models can recognize when they need more information and ask a user for simple interventions, much like a human collaborator would. The benchmark was constructed from seven existing datasets repurposed to test scenarios like recognizing occluded objects, enhancing poor-quality images, and interpreting ambiguous sketches.

In a comprehensive evaluation of 22 popular MLLMs, the researchers made several key discoveries. First, current models overwhelmingly lack this proactive behavior. Surprisingly, a model's capacity (size and compute) showed no correlation with its proactiveness score. Even more counterintuitive, providing hints or using in-context learning with conversation histories often introduced negative bias and hurt performance, rather than helping.

The study's most promising finding came from exploring a simple reinforcement learning (RL) fine-tuning strategy. The results demonstrated that proactiveness is a learnable skill; models trained with this method not only improved on the benchmark tasks but also showed signs of generalizing to unseen scenarios. The team has publicly released ProactiveBench, framing it as a foundational step toward building AI assistants that can engage in more natural, collaborative, and effective human-AI teamwork by knowing when to seek clarification.

Key Points

Benchmarked 22 MLLMs across 7 tasks, finding a universal lack of proactive 'help-seeking' behavior.
Found no correlation between model size/capacity and proactiveness scores, challenging assumptions about scaling.
Simple RL fine-tuning proved proactiveness is a learnable skill that can generalize to new situations.

Why It Matters

For practical AI deployment, assistants that know when to ask questions are safer, more reliable, and truly collaborative.

Read Original Article

ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models

Why It Matters

Stay Ahead in AI