Open Source

Quick Qwen-35B-A3B Test

Open-source model analyzes images and executes Linux commands to solve visual tasks on consumer GPUs.

Deep Dive

A viral test demonstrates the advanced capabilities of Alibaba's open-source Qwen-35B-A3B model, showcasing its ability to perform complex, multi-step reasoning tasks that combine vision and action. Using the new open-terminal feature in the Open WebUI interface, a user provided the model with a low-quality image and asked it to locate a ring. The model successfully analyzed the visual data, understood the ring's exact position, and then autonomously executed a Linux terminal command to circle the location on the image. This test highlights a significant leap for open-source AI, moving beyond simple chat to genuine tool-use and environmental interaction.

The technical achievement is notable for its speed and accessibility, running at approximately 100 tokens per second on consumer-grade hardware like an NVIDIA GeForce RTX 3090. This performance makes sophisticated agentic workflows—where an AI perceives, plans, and acts—viable outside of expensive, proprietary cloud APIs. The integration of robust vision capabilities with reliable tool-calling in a single, efficient model suggests a shift toward more practical and autonomous AI assistants. It raises the bar for what developers can build locally, enabling new applications in data analysis, automation, and interactive problem-solving without relying on closed-source models.

Key Points
  • Qwen-35B-A3B analyzed a low-quality image to find and mark a ring's location using terminal commands.
  • Achieves ~100 tokens/second inference speed on consumer hardware (NVIDIA RTX 3090).
  • Combines vision understanding and precise tool-calling in one open-source model for agentic workflows.

Why It Matters

Enables developers to build fast, autonomous AI agents that see and act using affordable local hardware.