Qwen/WebWorld 32B/14B/8B (Qwen3 finetune)
WebWorld uses 1M real web interactions to train agents that surpass GPT-5.
Alibaba's Qwen team has released WebWorld, a family of open-source world models (8B, 14B, 32B parameters) designed specifically for training and evaluating web agents. The models are built from a scalable hierarchical data pipeline that processes over 1 million real-world web interaction trajectories. WebWorld supports long-horizon simulation of up to 30+ steps and can represent web states in multiple formats: Accessibility Tree, HTML, XML, Markdown, and natural language. It also features chain-of-thought (CoT) activated reasoning for transition prediction, enabling agents to plan ahead. The training data spans diverse real web tasks, and the approach generalizes cross-domain to code, GUI, and game environments—making it a versatile foundation for autonomous web navigation.
Agents trained on WebWorld-synthesized trajectories achieved a 9.9% improvement on MiniWob++ and a 10.9% gain on WebArena, two major benchmarks for web agent performance. Even more striking, when used for inference-time lookahead search, WebWorld outperformed GPT-5 as a world model, providing better predictive accuracy for future states. This suggests that specialized, open-source world models can rival (and even beat) closed-source giants in agentic tasks. With all three model sizes available on Hugging Face under permissive licenses, WebWorld lowers the barrier for researchers and developers to build more capable web agents without relying on expensive proprietary APIs.
- Trained on 1M+ real-world web interaction trajectories via a scalable hierarchical pipeline
- Agents using WebWorld achieve +9.9% on MiniWob++ and +10.9% on WebArena benchmarks
- Outperforms GPT-5 as a world model for inference-time lookahead search
Why It Matters
Open-source world models that beat GPT-5 could accelerate web automation and AI agent development.