ZwZ 8B/7B/4B
This new vision model sees fine details in one glance, eliminating slow tool calls.
Deep Dive
The new ZwZ-8B multimodal model achieves state-of-the-art fine-grained visual understanding in a single forward pass, eliminating the need for slow inference-time zooming or tool-calling. Built on Qwen3-VL-8B and trained with Region-to-Image Distillation and reinforcement learning, it beats benchmarks for open-source models of comparable size. It also shows strong generalization on visual reasoning, GUI agent, and AIGC detection tasks. Smaller 7B and 4B versions are also available.
Why It Matters
It makes detailed visual AI analysis dramatically faster and more efficient for real-world applications.