BEAST3D uses Gaussian splatting for animal 3D pose and neural encoding
Self-supervised model works with just four camera views—no manual labeling needed.
BEAST3D is a self-supervised pretraining framework for analyzing animal behavior from multi-view video. It uses a vision transformer to predict 3D Gaussian splats, reconstructing held-out views through differentiable rendering while segmenting the animal from the background. Unlike general-purpose 3D models that need dense overlapping views to estimate camera geometry, BEAST3D directly conditions on known camera parameters, enabling robust 3D reconstruction with as few as four camera views. This makes it practical for typical lab setups where camera coverage is sparse. The model was validated across four species (mice, flies, etc.) and produces viewpoint-invariant features that transfer to three key tasks: novel view synthesis (validating 3D quality), multi-view pose estimation (sparse keypoint trajectories for behavioral analysis), and neural encoding (relating 3D behavioral features to simultaneously recorded neural activity). The paper is on arXiv (2606.02937) and demonstrates a versatile, annotation-free approach to linking behavior and brain dynamics.
- Self-supervised learning from unlabeled multi-view video—no manual pose annotation required.
- Works with as few as four calibrated camera views using 3D Gaussian splatting and differentiable rendering.
- Validated on four species; transfers to pose estimation and neural encoding of behavior.
Why It Matters
BEAST3D automates linking 3D animal movement to neural activity, accelerating neuroscience research without manual labeling.