VeGAS samples an ensemble of candidate actions at inference time and uses a generative verifier to select the best, without modifying the base policy?

VeGAS samples an ensemble of candidate actions at inference time and uses a generative verifier to select the best, without modifying the base policy.

An LLM-driven data synthesis strategy automatically creates a curriculum of failure cases to train the verifier—critical for effectiveness?

An LLM-driven data synthesis strategy automatically creates a curriculum of failure cases to train the verifier—critical for effectiveness.

Achieves up to 36% relative improvement on long-horizon, multi-object tasks in Habitat and ALFRED benchmarks over strong CoT baselines?

Achieves up to 36% relative improvement on long-horizon, multi-object tasks in Habitat and ALFRED benchmarks over strong CoT baselines.

Research & Papers

VeGAS boosts embodied AI agents by 36% with verifier-guided action selection

arXiv cs.AI May 14, 2026

⚡Think twice: new method checks actions before acting, boosting success by 36%.

Deep Dive

A team of researchers (Nishad Singhi et al.) from multiple institutions introduced VeGAS (Verifier-Guided Action Selection) at CVPR 2026 to address the fragility of multimodal large language model (MLLM)-based embodied agents in out-of-distribution scenarios. Traditional chain-of-thought (CoT) reasoning often fails when faced with unexpected obstacles. VeGAS operates at inference time by sampling multiple candidate actions from the model, then using a dedicated generative verifier to select the most reliable one—all without modifying the underlying policy. This 'think twice, act once' approach dramatically improves robustness.

To train the verifier effectively, the authors developed an LLM-driven data synthesis strategy that automatically builds a diverse curriculum of failure cases, exposing the verifier to a rich distribution of potential errors. Without this synthetic training, using an off-the-shelf MLLM as a verifier yielded no improvement. Across challenging embodied reasoning benchmarks in Habitat and ALFRED environments, VeGAS achieved up to a 36% relative performance gain over strong CoT baselines, especially on long-horizon, multi-object tasks. The paper was accepted as a CVPR 2026 Finding.

Key Points

VeGAS samples an ensemble of candidate actions at inference time and uses a generative verifier to select the best, without modifying the base policy.
An LLM-driven data synthesis strategy automatically creates a curriculum of failure cases to train the verifier—critical for effectiveness.
Achieves up to 36% relative improvement on long-horizon, multi-object tasks in Habitat and ALFRED benchmarks over strong CoT baselines.

Why It Matters

Makes embodied AI agents more reliable in complex, real-world scenarios by verifying actions before execution.

Read Original Article

VeGAS boosts embodied AI agents by 36% with verifier-guided action selection

Why It Matters

Related Articles

Stay Ahead in AI