Research & Papers

GPT-5 and Gemini Flash 3 Get Worse, Cost 55% More at 'High Effort'

r/MachineLearning February 14, 2026

⚡New benchmark reveals paying more for 'thinking time' can hurt AI accuracy.

Deep Dive

A new Deep Research Bench study of 22 model configurations shows that for top models, higher 'effort' settings reduce accuracy and increase cost. GPT-5's score dropped from 0.496 to 0.481, while its cost per query jumped 55% from $0.25 to $0.39. Gemini 3 Flash saw a similar 5-point accuracy decline. The finding contradicts the assumption that more computational 'thinking' always yields better results for complex research tasks.

Why It Matters

This forces developers and businesses to rethink how they configure and budget for AI-powered research, potentially saving significant costs.

Read Original Article

GPT-5 and Gemini Flash 3 Get Worse, Cost 55% More at 'High Effort'

Why It Matters

Related Articles

🚀 Stay Ahead in AI