[R] Higher effort settings reduce deep research accuracy for GPT-5 and Gemini Flash 3
New benchmark reveals paying more for 'thinking time' can hurt AI accuracy.
Deep Dive
A new Deep Research Bench study of 22 model configurations shows that for top models, higher 'effort' settings reduce accuracy and increase cost. GPT-5's score dropped from 0.496 to 0.481, while its cost per query jumped 55% from $0.25 to $0.39. Gemini 3 Flash saw a similar 5-point accuracy decline. The finding contradicts the assumption that more computational 'thinking' always yields better results for complex research tasks.
Why It Matters
This forces developers and businesses to rethink how they configure and budget for AI-powered research, potentially saving significant costs.