Research & Papers

DIAGRAMS: A Review Framework for Reasoning-Level Attribution in Diagram QA

New review framework automates evidence selection for diagram reasoning tasks

Deep Dive

Diagram question answering (Diagram QA) requires linking each question-answer pair to all visual regions needed for reasoning—not just the region containing the final answer. Creating this structured evidence across diagrams, charts, maps, circuits, and infographics is time-consuming, and existing annotation tools are tightly coupled to dataset-specific formats. A team of researchers including Anirudh Iyengar, Tampu Ravi Kumar, Manan Suri, and others from institutions like the University of Maryland has introduced DIAGRAMS, a review-first framework that decouples interface logic from dataset-specific JSON structures through an internal meta-schema and dataset adapters. Given an image and QA pair with optional candidate regions, the system performs QA-conditioned evidence selection and proposes the regions required for reasoning. When QA pairs or candidate regions are missing, it generates them and supports human verification and refinement.

In experiments across six Diagram QA datasets, model-suggested evidence achieved 85.39% precision and 75.30% recall (micro-averaged) against reviewer-final selections. These results indicate that the review-first approach significantly reduces manual region creation while maintaining high agreement with final reasoning-level attributions. The authors have released a public demo and installable package to support dataset auditing, grounded supervision creation, and grounded evaluation. This framework could accelerate the development of more robust diagram understanding systems by providing a standardized way to create high-quality reasoning annotations.

Key Points
  • DIAGRAMS uses a meta-schema and dataset adapters to decouple interface logic from dataset-specific formats.
  • Model-suggested evidence achieves 85.39% precision and 75.30% recall across six Diagram QA datasets.
  • Framework auto-generates missing QA pairs or candidate regions and supports human verification.

Why It Matters

Standardizes diagram annotation creation, reducing manual effort while improving grounded evaluation of AI reasoning.