Research & Papers

ResNet50 & 34 beat deeper backbones in RT-DETR for robotics vision

Intermediate-depth models offer near-perfect accuracy under lighting or background variation.

Deep Dive

A new study from Brazilian researchers systematically evaluates how ResNet backbone depth and dropout regularization affect RT-DETR, a real-time transformer detector, under environmental conditions relevant to competitive robotics. Testing four backbones (ResNet18, 34, 50, 101) on round-object detection, they varied illumination and background contrast while measuring accuracy, confidence, and inference latency. All models were trained identically and tested across multiple environmental settings.

Key findings show that environmental changes primarily impact prediction confidence, not latency or classification accuracy (which stayed near 1.00 in most cases). Under illumination variation, ResNet50 delivered the best trade-off: near-perfect accuracy, confidence up to 0.869, and latency of 0.058-0.059ms. Under background variation, ResNet34 led with confidence up to 0.887 and similar accuracy. The results suggest that intermediate-depth backbones (ResNet34 and 50) are optimal, and that deeper architectures like ResNet101 offer no benefit in these conditions. The paper is accepted at the DATA 2026 conference.

Key Points
  • ResNet50 best under illumination: accuracy near 1.00, confidence 0.869, latency ~0.058ms
  • ResNet34 best under background variation: confidence 0.887, also near-perfect accuracy
  • Deeper ResNet101 offered no advantage; inference latency unaffected by environment

Why It Matters

Guides roboticists in selecting efficient backbones for real-time detection under real-world environmental shifts.