Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement
AI still struggles to read charts. A new survey exposes the critical gaps.
A new comprehensive survey analyzes how Multimodal Large Language Models (MLLMs) understand charts, a key task requiring visual and textual data fusion. It maps the evolution from classic deep learning to SOTA MLLM paradigms, categorizes tasks and datasets, and critically examines current models' perceptual and reasoning deficits. The paper identifies promising future directions, including advanced alignment techniques and reinforcement learning, to build more robust systems for chart analysis.
Why It Matters
This roadmap is crucial for building reliable AI that can accurately interpret business, scientific, and financial data visualizations.