Research & Papers

GPT-5 and Gemini Flash 3 Get Worse, Cost 55% More at 'High Effort'

New benchmark reveals paying more for 'thinking time' can hurt AI accuracy.

Deep Dive

A new Deep Research Bench study of 22 model configurations shows that for top models, higher 'effort' settings reduce accuracy and increase cost. GPT-5's score dropped from 0.496 to 0.481, while its cost per query jumped 55% from $0.25 to $0.39. Gemini 3 Flash saw a similar 5-point accuracy decline. The finding contradicts the assumption that more computational 'thinking' always yields better results for complex research tasks.

Why It Matters

This forces developers and businesses to rethink how they configure and budget for AI-powered research, potentially saving significant costs.

📬 Get the top 10 AI stories daily