ReVision reduces visual token usage by ~46% on average across three benchmarks with Qwen2.5-VL-7B?

ReVision reduces visual token usage by ~46% on average across three benchmarks with Qwen2.5-VL-7B

Success rate improves by 3% over no-drop baseline while using far fewer tokens?

Success rate improves by 3% over no-drop baseline while using far fewer tokens

Enables agents to benefit from longer history (5+ screenshots) without performance saturation?

Enables agents to benefit from longer history (5+ screenshots) without performance saturation

Research & Papers

ReVision cuts visual token usage by 46% for computer-use AI agents

arXiv cs.CL May 13, 2026

⚡New technique slashes token costs by 46% while boosting agent performance

Deep Dive

Computer-use agents (CUAs) that navigate graphical user interfaces rely on processing screenshots, each generating a massive number of visual tokens. As interaction trajectories lengthen, token costs skyrocket, forcing models to drop historical context—leading to performance plateaus. A team led by Amirhossein Abaskohi (with co-authors from multiple institutions) introduces ReVision, a training method that learns to select and remove redundant visual patches between consecutive screenshots while preserving spatial structure. By comparing patch representations across frames, the model drops temporally redundant information without losing critical task cues.

Tested on OSWorld, WebTailBench, and AgentNetBench using Qwen2.5-VL-7B with five history screenshots, ReVision reduces token usage by 46% on average and boosts success rate by 3% over the no-drop baseline. More importantly, it demonstrates for the first time that longer history—up to 20+ screenshots—continues to improve performance when redundancy is removed, overturning the common belief that visual history saturates. This breakthrough means CUAs can now scale to longer, more complex tasks without exploding compute and cost.

Key Points

ReVision reduces visual token usage by ~46% on average across three benchmarks with Qwen2.5-VL-7B
Success rate improves by 3% over no-drop baseline while using far fewer tokens
Enables agents to benefit from longer history (5+ screenshots) without performance saturation

Why It Matters

Unlocks cost-efficient AI agents that leverage rich visual history, critical for automating complex desktop and web workflows.

Read Original Article

ReVision cuts visual token usage by 46% for computer-use AI agents

Why It Matters

Related Articles

🚀 Stay Ahead in AI