Research & Papers

Seeing Graphs Like Humans: Benchmarking Computational Measures and MLLMs for Similarity Assessment

arXiv cs.HC February 27, 2026

⚡New research shows MLLMs like GPT-5 beat traditional algorithms at judging graph similarity like humans do.

Deep Dive

A research team from Seoul National University has published a groundbreaking study titled 'Seeing Graphs Like Humans,' which directly benchmarks automated graph similarity measures against human visual perception. The paper addresses a critical gap in visual analytics: traditional computational metrics often provide recommendations that conflict with analysts' intuitive judgments, potentially increasing confusion rather than reducing cognitive load. Through three interconnected experiments using a dataset of 1,881 node-link diagrams and judgments from 32 human participants, the study establishes a human baseline for graph similarity, revealing that people prioritize global shapes and edge densities over exact topological details.

The research then benchmarks 16 established computational measures against this human consensus, identifying Portrait divergence as the best-performing traditional metric, though with only moderate alignment. The most significant finding comes from evaluating three state-of-the-art Multimodal Large Language Models (MLLMs): GPT-5, Gemini 2.5 Pro, and Claude Sonnet 4.5. The results demonstrate that MLLMs, particularly GPT-5, significantly outperform all traditional measures in aligning with human perception of graph similarity. Furthermore, these models provide interpretable natural language rationales for their decisions, a key advantage over opaque algorithmic scores. Claude Sonnet 4.5 was noted for its superior computational efficiency. The findings suggest MLLMs hold significant promise as effective, explainable proxies for human perception and as intelligent guides that can uncover subtle visual nuances in complex graph data.

Key Points

GPT-5 outperformed 16 traditional computational metrics in aligning with human judgments of graph similarity.
The study used a dataset of 1,881 node-link diagrams and consensus data from 32 human participants.
MLLMs provide interpretable natural language rationales, a key advantage over traditional opaque algorithmic scores.

Why It Matters

Enables more intuitive, explainable AI assistants for data analysts working with complex network visualizations.

Read Original Article

Seeing Graphs Like Humans: Benchmarking Computational Measures and MLLMs for Similarity Assessment

Why It Matters

Stay Ahead in AI