Can LLMs Perceive Time? An Empirical Investigation
AI models predict tasks will take minutes when they actually finish in seconds, with major implications for scheduling.
A new research paper titled "Can LLMs Perceive Time? An Empirical Investigation" by Aniketh Garikaparthi reveals a fundamental weakness in current large language models. The study, which tested 68 tasks across four major model families, found that models like GPT-5 cannot accurately estimate how long their own computational tasks will take. Pre-task predictions consistently overshot actual duration by 4-7 times, with models predicting human-scale minutes for tasks that completed in mere seconds. This failure persisted even in relative comparisons, where models scored at or below chance levels on counter-intuitive task pairs.
The research demonstrates that LLMs possess propositional knowledge about time from their training data but lack experiential grounding in their own inference processes. This disconnect becomes particularly problematic in multi-step agentic settings, where time estimation errors ballooned to 5-10 times actual duration. The study's findings have immediate practical implications for AI agent scheduling, automated planning systems, and any time-critical applications where accurate duration prediction is essential for effective operation and resource allocation.
- Models overestimated task duration by 4-7x, predicting minutes for second-long tasks
- GPT-5 scored only 18% on counter-intuitive task pairs, performing at or below chance
- Errors persisted in multi-step agentic settings with 5-10x miscalculations
Why It Matters
This fundamental limitation affects AI agent reliability in scheduling, planning, and any application where time estimation is critical.