Agent Frameworks

TeamHOI: Learning a Unified Policy for Cooperative Human-Object Interactions with Any Team Size

A single AI policy now orchestrates up to eight humanoid agents to cooperatively lift and carry objects of any shape.

Deep Dive

Researchers Stefan Lionar and Gim Hee Lee have introduced TeamHOI, a novel framework that solves a major challenge in physics-based simulation: enabling scalable, cooperative behavior between multiple AI agents. While single-agent control has seen remarkable progress, coordinating teams of humanoids for tasks like carrying large objects has remained difficult. TeamHOI's breakthrough is a single, decentralized policy that can handle any number of cooperating agents, from two to eight, using a Transformer-based network. Each agent acts on local observations but coordinates by attending to 'teammate tokens,' allowing the system to scale seamlessly without retraining for different team sizes.

To overcome the scarcity of real-world data for multi-person object interactions, the team developed a clever training strategy called masked Adversarial Motion Prior (AMP). This method uses readily available single-human motion data as a reference for realism, but masks the body parts that would be interacting with the object. The masked regions are then shaped purely by task rewards, allowing the AI to invent diverse and physically plausible cooperative grips and carries. The framework was tested on a challenging cooperative carrying task with objects of varied geometries, and was further stabilized by a novel formation reward that works regardless of team size or object shape. The result is a single policy that demonstrates coherent, successful teamwork across diverse configurations, a significant step toward more complex multi-agent simulations for robotics, gaming, and animation.

Key Points
  • Uses a single Transformer-based policy with 'teammate tokens' to coordinate 2 to 8 agents for object carrying tasks.
  • Employs a masked Adversarial Motion Prior (AMP) to generate realistic cooperative motions from single-human reference data.
  • Achieves high success rates with a team-size-agnostic formation reward, enabling stable carrying of varied object geometries.

Why It Matters

This enables more realistic and scalable multi-agent simulations for robotics, video game NPCs, and animated films, moving beyond scripted behaviors.