Research & Papers

WildDet3D: Scaling Promptable 3D Detection in the Wild

New architecture accepts text, point, and box prompts, achieving 34.2 AP3D on Omni3D with a 20.7 AP boost from depth cues.

Deep Dive

A research team led by Weikai Huang and including contributors from the University of Washington, Apple, and the Allen Institute for AI has published WildDet3D, a breakthrough in monocular 3D object detection. The work tackles two major bottlenecks: existing methods are limited to single prompt types and lack mechanisms to incorporate geometric cues like depth, and current datasets are narrow in scope. WildDet3D introduces a unified, geometry-aware architecture that natively accepts diverse prompts—text, points, and bounding boxes—and can optionally integrate auxiliary depth signals at inference time for substantial performance gains.

To train this powerful model, the team created WildDet3D-Data, the largest open 3D detection dataset to date. It was constructed by generating candidate 3D boxes from existing 2D annotations and retaining only human-verified ones, resulting in over 1 million images across a staggering 13,500 categories in diverse real-world scenes. This scale enables unprecedented open-world generalization. The model establishes a new state-of-the-art, achieving 34.2/36.4 AP3D on Omni3D with text and box prompts and 40.3/48.9 ODS in zero-shot evaluation on Argoverse 2 and ScanNet. Notably, incorporating depth cues at inference yields an average gain of +20.7 AP across settings.

Key Points
  • Unified architecture accepts text, point, and box prompts natively and can incorporate depth signals, gaining +20.7 AP on average.
  • Trained on WildDet3D-Data, a new 1M-image dataset spanning 13.5K object categories for robust open-world performance.
  • Sets new SOTA, achieving up to 48.9 ODS in zero-shot evaluation and 36.4 AP3D on the Omni3D benchmark.

Why It Matters

Enables AI systems to understand and interact with the physical 3D world from a single image, critical for robotics, AR/VR, and autonomous systems.