AI Safety

The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

New research finds most AI agent developers share little safety or impact data, creating policy blindspots.

Deep Dive

A research team from MIT, Harvard, and other institutions has released the 2025 AI Agent Index, providing the first comprehensive analysis of 30 deployed AI agent systems. The index documents technical specifications, capabilities, and safety features of leading agents including OpenAI's GPT-4o, Anthropic's Claude 3.5 Sonnet, and various specialized agents from companies like Adept and Cognition. The researchers collected data through public documentation and direct correspondence with developers, revealing significant transparency gaps in the rapidly growing agent ecosystem.

The study found that while agents are increasingly capable of performing complex professional tasks with minimal human oversight, most developers share alarmingly little information about safety protocols, evaluation methodologies, and potential societal impacts. Only 40% of the documented agents provided basic safety testing information, and fewer than 25% disclosed details about their training data sources or deployment constraints. The research team developed a standardized framework for comparing agents across five categories: origins, design, capabilities, ecosystem integration, and safety features.

This transparency deficit creates significant challenges for policymakers and researchers trying to understand the implications of increasingly autonomous AI systems. With the AI agent market projected to reach $2 billion by 2025 and systems becoming capable of handling tasks ranging from software development to financial analysis, the lack of standardized reporting makes risk assessment difficult. The index serves as both a benchmarking tool and a call for greater transparency, highlighting the need for industry-wide standards as agentic AI becomes more integrated into professional workflows.

Key Points
  • Analyzes 30 state-of-the-art AI agents including GPT-4o and Claude 3.5 with standardized technical documentation
  • Reveals only 40% of developers disclose safety testing data, creating major transparency gaps for policymakers
  • Provides framework for tracking $2B agent ecosystem where systems autonomously perform professional tasks

Why It Matters

Creates essential transparency framework for policymakers and businesses evaluating increasingly autonomous AI systems in professional environments.