Research & Papers

Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

Researchers create the first real-web benchmark requiring agents to infer preferences from long-term user history.

Deep Dive

Researchers Serin Kim, Sangam Lee, and Dongha Lee built Persona2Web, the first benchmark for evaluating personalized web agents on the real open web. It tests agents' ability to resolve ambiguous queries by inferring implicit user preferences from long-term history, using a 'clarify-to-personalize' principle. The benchmark includes user histories, ambiguous queries, and a reasoning-aware evaluation framework, with code and datasets publicly available for reproducibility.

Why It Matters

Moves AI assistants beyond simple commands to truly personalized, context-aware agents that understand your implicit needs.