Image & Video

MTT-Bench: Predicting Social Dominance in Mice via Multimodal Large Language Models

AI decodes mouse social hierarchies from raw video, no labels needed.

Deep Dive

Researchers Yunquan Chen and Haoyu Chen have introduced MTT-Bench, a novel benchmark that uses multimodal large language models (MLLMs) to predict social dominance hierarchies in mice from raw behavioral video. The paper, submitted to arXiv on April 24, 2026, focuses on the Mouse Tube Test, a standard assay where two mice enter a tube from opposite ends, and the one that forces the other out is deemed dominant. By fine-tuning existing MLLM architectures on annotated videos of these pairwise interactions, the models can perform zero-shot inference on unseen behavioral sequences—predicting dominance without explicit labels during testing. The framework shows high agreement with traditional tube test rankings, demonstrating that foundation models can capture nuanced social behaviors from visual data alone.

This work marks a significant step in applying general-purpose AI to ethology and social behavior analysis, potentially reducing the need for labor-intensive manual coding or domain-specific models. The MTT-Bench benchmark provides a standardized dataset for further research, enabling comparisons across different MLLM approaches. While the study focuses on mice, the methodology could extend to other species or social contexts, offering a scalable tool for behavioral neuroscience. The authors note that their approach leverages the multimodal capabilities of models like GPT-4V and similar architectures, which can process video frames and temporal dynamics simultaneously. This opens new directions for using large language models in fields traditionally reliant on specialized computer vision systems.

Key Points
  • MTT-Bench uses MLLMs to predict mouse social dominance from raw video without manual labels.
  • Fine-tuned models achieve zero-shot inference on unseen Mouse Tube Test interactions.
  • High agreement with traditional rankings suggests foundation models can replace domain-specific designs.

Why It Matters

AI interprets animal behavior automatically, accelerating neuroscience and ethology research without custom models.