New test reveals AI assistants struggle to remember tasks on phones
AI phone helpers have a serious memory problem, failing most real-world tasks.
Researchers created a new benchmark to test how well AI agents remember information across different phone app sessions. They found current systems have significant memory deficits, failing 89.8% of tasks that require remembering past actions. The study evaluated 11 different AI agents, identified five key failure modes, and provides five design improvements. All code and results from the benchmark will be fully open-sourced for public use.
Why It Matters
This exposes a critical weakness in AI assistants, preventing them from being truly helpful for complex, multi-step tasks.