RIVA recovers task accuracy from 27.3% to 50% when tools produce erroneous outputs, an 83% improvement?

RIVA recovers task accuracy from 27.3% to 50% when tools produce erroneous outputs, an 83% improvement

The system uses two specialized agents (verifier and tool generation) that collaborate through cross-validation and history tracking?

The system uses two specialized agents (verifier and tool generation) that collaborate through cross-validation and history tracking

Even without tool errors, RIVA improves accuracy from 28% to 43.8% on the AIOpsLab benchmark?

Even without tool errors, RIVA improves accuracy from 28% to 43.8% on the AIOpsLab benchmark

Developer Tools

RIVA's multi-agent system boosts IaC drift detection accuracy by 83%

arXiv cs.SE March 04, 2026

⚡New AI system recovers task accuracy from 27.3% to 50% when tools fail, preventing cloud misconfigurations.

Deep Dive

A research team from EPFL and TU Delft has introduced RIVA (Robust Infrastructure by Verification Agents), a breakthrough multi-agent AI system designed to solve a critical problem in cloud infrastructure management: reliable configuration drift detection. While Infrastructure as Code (IaC) tools like Terraform and Ansible automate cloud provisioning, verifying that deployed systems remain consistent with their specifications remains challenging due to bugs, manual changes, or updates. Existing LLM-based agentic systems are vulnerable because they implicitly trust tool outputs, making them prone to missed drift or false alarms when tools fail. RIVA addresses this fundamental reliability gap in autonomous infrastructure verification.

RIVA's architecture employs two specialized agents—a verifier agent and a tool generation agent—that collaborate through iterative cross-validation, multi-perspective verification, and comprehensive tool call history tracking. This design enables the system to distinguish between genuine infrastructure problems and broken tool responses. Evaluation on the AIOpsLab benchmark demonstrates dramatic improvements: in the presence of erroneous tool responses, RIVA recovered task accuracy from just 27.3% with a baseline ReAct agent to 50.0% on average—an 83% relative improvement. Even without tool errors, RIVA boosted accuracy from 28% to 43.8%. The system's ability to validate diverse tool calls through agent collaboration represents a significant advance toward reliable autonomous operations in production cloud environments, where configuration drift can lead to security vulnerabilities, compliance violations, and service outages.

Key Points

RIVA recovers task accuracy from 27.3% to 50% when tools produce erroneous outputs, an 83% improvement
The system uses two specialized agents (verifier and tool generation) that collaborate through cross-validation and history tracking
Even without tool errors, RIVA improves accuracy from 28% to 43.8% on the AIOpsLab benchmark

Why It Matters

Prevents costly cloud misconfigurations and security vulnerabilities by making AI-powered infrastructure verification reliable when tools fail.

Read Original Article

RIVA's multi-agent system boosts IaC drift detection accuracy by 83%

Why It Matters

Related Articles

🚀 Stay Ahead in AI