Research & Papers

Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Individual-Level Livestock Monitoring and Longitudinal Visual Analytics

446M parameters shrunk to 40M—pig tracking now runs on a Jetson Orin.

Deep Dive

Yang and Hostens tackled the GPU memory bottleneck of foundation-model pipelines for precision livestock farming. Their method compresses SAM 3's Perception Encoder (446M parameters, ViT-L+) into a 40.66M multi-scale student via a Feature Pyramid Network built on TinyViT-21M-512, combined with a four-term direction-then-scale distillation loss and sliding-window session pruning. The DINOv3 ViT-S/16 (21.6M parameters) serves as per-individual embedder. On the Edinburgh Pig dataset, the compressed pipeline loses only 1.68 MOTA and 0.84 IDF1 points relative to the full teacher, but achieves a 7.77x parameter reduction and 3.01x VRAM reduction (19.52GB → 6.49GB). It also reaches 97.34% top-1 accuracy and 91.67% macro-F1 on nine-class pig behavior classification.

The entire system runs within the 16GB envelope of an NVIDIA Jetson Orin NX, leaving 4.9GB of headroom. The authors propose an on-device embedding-pool re-identification mechanism that stores roughly 94MB per animal per year, enabling longitudinal visual records that can be correlated with disease, lameness, and growth outcomes—though this component has not yet been empirically validated. This work demonstrates that state-of-the-art vision models can be deployed on edge hardware for continuous, individual-level livestock monitoring, opening the door to scalable, real-time analytics in precision agriculture without cloud dependence.

Key Points
  • SAM 3's 446M-parameter backbone distilled into a 40.66M TinyViT student using Feature Pyramid Network and multi-term distillation loss.
  • Achieves 92.29% MOTA and 96.15% IDF1 on Edinburgh Pig dataset with 7.77x parameter and 3.01x VRAM reduction.
  • Fits entirely on NVIDIA Jetson Orin NX 16GB with 4.9GB headroom; proposed re-ID mechanism stores ~94MB/animal/year.

Why It Matters

Enables real-time, on-device individual livestock monitoring without cloud reliance, improving welfare and farm efficiency.