Research & Papers

PhysX-Omni unifies simulation-ready 3D generation for rigid, deformable, articulated objects

New framework generates physically accurate 3D assets of any type from text or images.

Deep Dive

Existing 3D generation methods typically ignore physical properties or handle only one asset type (rigid, deformable, or articulated). PhysX-Omni solves this with a unified framework that generates all three categories directly from text or images. The key innovation is a geometry representation designed for Vision-Language Models (VLMs) that encodes high-resolution 3D structures without compression, enabling the model to produce physically accurate meshes with proper kinematics and material properties. The framework supports both generation and understanding tasks, outputting assets ready to drop into physics simulators.

To train and evaluate the model, the authors built PhysXVerse, the first general-purpose simulation-ready 3D dataset covering diverse indoor and outdoor scenes. They also propose PhysX-Bench, a comprehensive benchmark that tests six attributes: geometry, absolute scale, material, affordance, kinematics, and function description. Experiments show PhysX-Omni outperforms prior methods in both generation quality and physical accuracy. The work has clear implications for robotics (policy learning in simulated environments) and immersive simulations, where on-demand physically plausible assets dramatically reduce manual modeling effort.

Key Points
  • Generates rigid, deformable, and articulated 3D objects in a single unified framework
  • Novel VLM-compatible geometry representation encodes high-res 3D structures without compression
  • Includes PhysXVerse dataset and PhysX-Bench benchmark covering six physical attributes

Why It Matters

Accelerates robotics and simulation by generating physically accurate 3D assets on demand from text or images.