Research & Papers

AIDABench: AI Data Analytics Benchmark

arXiv cs.AI March 18, 2026

⚡New benchmark with 600+ complex tasks shows even human experts need 1-2 hours per question.

Deep Dive

A consortium of 27 researchers has launched AIDABench, a new benchmark designed to rigorously test AI systems on complex, end-to-end data analytics tasks. Unlike previous benchmarks that focus on isolated capabilities, AIDABench simulates real-world scenarios with 600+ diverse tasks across three core dimensions: question answering, data visualization, and file generation. These tasks are grounded in heterogeneous data types like spreadsheets, databases, and financial reports, reflecting analytical demands across various industries. The benchmark's difficulty is underscored by the fact that even human experts, when assisted by AI tools, require 1-2 hours to complete a single question.

In their evaluation of 11 state-of-the-art models—including proprietary systems like Claude Sonnet 4.5 and Gemini 3 Pro Preview, and open-source models like Qwen3-Max-2026—the results were sobering. The top-performing model achieved a pass-at-1 score of only 59.43%, revealing that current AI systems still struggle significantly with the integrated reasoning and execution required for practical data analytics. The team provides a detailed analysis of failure modes and identifies key challenges for future research, positioning AIDABench as a critical tool for enterprise procurement, tool selection, and guiding model development toward solving genuine business problems.

Key Points

Benchmark includes 600+ complex tasks across QA, visualization, and file generation using real-world documents.
Even human experts with AI assistance need 1-2 hours per question, highlighting the benchmark's difficulty.
Top model (unspecified) scored only 59.43% pass rate, showing a major gap in AI's practical analytics capabilities.

Why It Matters

Provides a rigorous standard for enterprises to evaluate AI tools on real business analytics, not just academic tasks.

Read Original Article

AIDABench: AI Data Analytics Benchmark

Why It Matters

Stay Ahead in AI