DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data
A 4B-parameter AI agent trained on only 10,000 open-source examples beats larger 9B models and closes in on 30B-class systems.
A research team has introduced DR-Venus, a frontier 4-billion-parameter AI agent designed for 'edge-scale' deployment on local devices. The key breakthrough is that this capable 'deep research agent'—an AI that can autonomously plan and execute multi-step research tasks—was trained on a remarkably small dataset of just 10,000 open-source examples. Using a novel two-stage training recipe, the team first applied 'agentic supervised fine-tuning' with strict data cleaning and resampling of long-horizon tasks to build foundational skills.
In the second stage, they employed a specialized 'agentic reinforcement learning' (RL) method to boost reliability on complex research chains. To make RL effective for such a small model, they built on the IGPO framework and designed turn-level rewards based on 'information gain' and format-aware regularization. This improved supervision density and credit assignment for each step in a multi-turn task. The result is a model that significantly outperforms previous agentic models with under 9 billion parameters on multiple benchmarks and narrows the performance gap with much larger 30-billion-parameter class systems.
The work challenges the assumption that massive datasets are always necessary for advanced AI capabilities. The team's analysis suggests that 4B-parameter agents already possess strong latent performance potential, highlighting the value of sophisticated training techniques and 'test-time scaling.' By releasing the model, code, and recipes, they aim to support reproducible research into efficient, small-scale agents. This development points toward a future where powerful, private AI research assistants can run directly on personal devices without relying on cloud-based giants.
- DR-Venus is a 4B-parameter deep research agent trained on only 10,000 open data points using a novel two-stage recipe.
- It outperforms prior agentic models under 9B parameters and narrows the gap to 30B-class systems on research benchmarks.
- The method uses agentic SFT for basic capability and a specialized RL approach with turn-level rewards for long-horizon task reliability.
Why It Matters
Enables powerful, private AI research assistants to run locally on devices, reducing cost, latency, and dependency on cloud APIs.