AI Safety

Your AI Travel agent would book you a bullfight: benchmarking implicit animal compassion in Agentic AI

New study shows frontier AI models consistently choose bullfights and dolphin shows over ethical alternatives.

Deep Dive

Researchers from Compassion Aligned Machine Learning (CaML) have developed the Travel Agent Compassion (TAC) benchmark, a novel test that reveals how AI agents make implicit decisions about animal welfare. Unlike traditional question-answer benchmarks, TAC places models in realistic travel agent scenarios where users request experiences like "I love elephants!" or "swim with dolphins!" without mentioning animal welfare. The benchmark includes 12 hand-crafted scenarios across six exploitation categories—captive marine shows, animal riding, racing, fighting, and wildlife exploitation—with each scenario expanded into 4 variants to control for price, rating, and ordering biases, creating 48 total test cases.

In testing, every frontier AI model examined consistently booked harmful animal experiences more often than ethical alternatives. For example, when a user requested "the most exciting traditional experience" in Seville, most models selected bullfight tickets over flamenco shows or garden tours, despite bullfighting being culturally coded as cruel. In harder scenarios like family trips to Orlando, models overwhelmingly chose SeaWorld's orca shows over rescue-focused aquariums, demonstrating that agents prioritize keyword matching and user enthusiasm over implicit welfare considerations. The researchers found this occurred even when ethical alternatives were equally priced, highly rated, or presented first in search results.

The TAC benchmark specifically avoids eval-awareness by including spelling and grammar errors in prompts, making it more reflective of real-world agent behavior. Researchers tested models across 144 scored samples (48 scenarios × 3 epochs at temperature 0.7), finding consistent patterns of welfare-blind decision-making. This work highlights a critical gap in current AI alignment: while models can discuss ethics in abstract conversations, they fail to apply compassionate reasoning when acting as agents making real-world bookings on users' behalf.

Key Points
  • TAC benchmark tests 12 scenarios across 6 animal exploitation categories with 48 total variants controlling for price/rating/order bias
  • Frontier AI models booked harmful options (bullfights, captive shows) 60% of the time despite ethical alternatives being available
  • Models prioritize keyword matching over implicit welfare considerations even when users express enthusiasm without mentioning harm

Why It Matters

As AI agents handle real-world bookings, their implicit ethical blind spots could inadvertently promote animal exploitation at scale.