Open Source

MolmoWeb 4B/8B

Open-source AI agents outperform larger closed models like GPT-4o on complex web navigation tasks.

Deep Dive

AllenAI has launched the MolmoWeb family, a new series of fully open-source multimodal web agents that are challenging the dominance of large, closed models. The standout model, MolmoWeb-8B, achieves state-of-the-art results, outperforming similar-scale open-weight models like Fara-7B and, remarkably, surpassing Set-of-Marks (SoM) agents built on much larger closed frontier models like OpenAI's GPT-4o. This demonstrates that specialized, efficient open models can compete with giants in specific domains like web interaction.

A key technical innovation is the use of test-time scaling via parallel rollouts with best-of-N selection. This technique, where the model generates multiple potential action sequences and selects the best, yields massive performance gains. On the WebVoyager benchmark, pass@4 rates jump to 94.7% from a pass@1 rate of 78.2%. Similarly, on Online-Mind2Web, performance more than doubles from 35.3% to 60.5%. The family includes 4B and 8B parameter versions, with the 4B model based on the Molmo2 architecture using Qwen3-8B and SigLIP 2 as its vision backbone. All models are available on Hugging Face.

Key Points
  • MolmoWeb-8B outperforms GPT-4o on the Set-of-Marks benchmark for web agents.
  • Test-time scaling boosts WebVoyager performance to 94.7% pass@4, a significant jump from 78.2% pass@1.
  • The models are fully open-source and available in 4B and 8B parameter sizes on Hugging Face.

Why It Matters

Provides a powerful, transparent, and efficient open-source alternative to closed AI for automating complex web tasks and research.