Research & Papers

New PlotChain benchmark reveals top AI models fail at reading engineering plots

arXiv cs.AI February 17, 2026

⚡Gemini 2.5 Pro beats GPT-4.1, but all models struggle with frequency-domain tasks.

Deep Dive

A new deterministic benchmark called PlotChain tests multimodal LLMs on reading quantitative data from 450 engineering plots like Bode and stress-strain curves. Using strict JSON output and zero temperature, it reveals major performance gaps. Gemini 2.5 Pro leads with an 80.42% pass rate, just ahead of GPT-4.1 (79.84%) and Claude Sonnet 4.5 (78.21%). However, performance on tasks like bandpass response plummets to 23% or lower, exposing a critical weakness.

Why It Matters

This exposes a major blind spot for AI in science and engineering, where accurate data extraction from graphs is essential.

Read Original Article

New PlotChain benchmark reveals top AI models fail at reading engineering plots

Why It Matters

Related Articles

🚀 Stay Ahead in AI