Agent Frameworks

VLM-CAD: VLM-Optimized Collaborative Agent Design Workflow for Analog Circuit Sizing

New workflow combines VLMs with symbolic reasoning to overcome AI's spatial blindness in engineering.

Deep Dive

A research team led by Guanyuan Pan has introduced VLM-CAD (Vision Language Model-Optimized Collaborative Agent Design Workflow), a novel AI system designed to tackle the notoriously difficult task of analog circuit sizing. The work addresses a critical weakness of current Vision Language Models (VLMs) like GPT-4V: their "spatial blindness" and tendency for logical hallucinations when interpreting dense, structured engineering schematics. VLM-CAD bridges this gap by deploying a multi-agent workflow where specialized modules handle different reasoning steps, anchored in deterministic facts rather than pure statistical inference.

At its core, VLM-CAD uses a neuro-symbolic parsing module called Image2Net, which transforms raw circuit diagram pixels into explicit topological graphs and structured JSON representations. This provides a factual, unambiguous foundation for the VLM to reason upon. To ensure reliability for high-stakes engineering decisions, the system employs ExTuRBO (Explainable Trust Region Bayesian Optimization), an advanced optimizer that uses agent-generated "semantic seeds" to warm-start searches and provides quantified evidence for every AI-generated design choice through Automatic Relevance Determination.

The experimental results, submitted to ACM Multimedia 2026, demonstrate significant gains. On two complex circuit benchmarks, VLM-CAD dramatically enhanced spatial reasoning accuracy while maintaining physics-based explainability—a must for engineering trust. Crucially, the AI-driven workflow consistently met complex performance specifications while optimizing for low power consumption, completing the entire design process in under 66 minutes. This represents a major step toward deploying robust, explainable multimodal AI in specialized technical domains where precision is non-negotiable.

Key Points
  • Integrates Image2Net, a neuro-symbolic parser that converts circuit schematics into topological graphs and JSON to ground VLM reasoning.
  • Uses ExTuRBO, an explainable Bayesian optimizer that provides quantified evidence for AI decisions via Automatic Relevance Determination.
  • Achieved high accuracy on complex benchmarks, satisfying specs with low power in under 66 minutes total runtime.

Why It Matters

Automates a complex, weeks-long engineering task in about an hour with explainable AI, potentially accelerating chip and hardware design.