Research & Papers

Shape: A Self-Supervised 3D Geometry Foundation Model for Industrial CAD Analysis

98.1% retrieval accuracy on 2,983 CAD meshes with near-zero overfitting

Deep Dive

Shape is a self-supervised 3D geometry foundation model designed specifically for industrial CAD analysis, developed by Bayangmbe Mounmo, Sam Chien, and Mile Mitrovic. The model converts surface meshes into dense per-token embeddings using a structured 3D latent grid, a multi-scale geometry-aware tokenizer called MAGNO with cross-attention, and a transformer processor that leverages grouped-query attention and RMSNorm. A learned reconstruction prior enables per-region attribution for explainable predictions, which is crucial for industrial applications where interpretability matters. The entire backbone has just 10.9 million parameters, making it lightweight yet powerful.

Pretrained on 61,052 CAD meshes from Thingi10K, MFCAD, and Fusion360, Shape achieves remarkable results on a held-out split of 2,983 meshes: reconstruction R² of 0.729 and 98.1% top-1 retrieval under the Wang-Isola protocol, with near-zero reconstruction train/val gap. A 2x2 ablation study revealed that per-dimension normalization is critical—without it, performance collapses (R² < 0.14, top-1 < 88%), but with it, both loss types succeed (R² > 0.70, top-1 > 96%). Smooth-L1 loss provides secondary stability. The model's code, embeddings, and an interactive demo are publicly released, enabling widespread adoption for industrial CAD workflows that require robust, generalizable, and explainable 3D geometric representations.

Key Points
  • Shape achieves 98.1% top-1 retrieval accuracy on 2,983 held-out CAD meshes
  • Only 10.9M parameters with per-region attribution for explainable predictions
  • Per-dimension normalization is critical—performance collapses without it (R²<0.14 vs >0.70)

Why It Matters

Enables accurate, explainable 3D CAD analysis with a lightweight model, transforming industrial design workflows.