Research & Papers

Tamaththul3D: High-Fidelity 3D Saudi Sign Language Avatars from Monocular Video

32% better hand accuracy for 500 Saudi Sign Language signs – no mocap needed

Deep Dive

Researchers from Saudi Arabia developed Tamaththul3D, a specialized reconstruction pipeline for generating high-fidelity 3D avatars of Saudi Sign Language (SSL) from a single monocular video. The system combines three state-of-the-art modules: SMPLer-X for robust body pose estimation, WiLoR for detailed hand refinement (including automatic localization and mirroring), and MediaPipe for 2D pose supervision. A kinematic-chain-based wrist alignment with hybrid swing-twist decomposition, plus 2D-supervised joint optimization, delivers up to 32% better hand accuracy than previous methods while maintaining competitive body pose.

Beyond the pipeline itself, the team released the first high-quality 3D parametric annotations (SMPL-X parameters) for the Ishara-500 dataset, covering 500 culturally authentic SSL signs. This dual contribution — accurate annotations and a specialized reconstruction method — addresses a critical gap for the ~400 million Arabic speakers worldwide who use Arabic Sign Language (ArSL) and its dialects. The work enables new accessibility technologies (real-time sign-to-speech avatars, virtual interpreters) and supports cultural preservation of Arab Deaf heritage.

Key Points
  • Tamaththul3D integrates SMPLer-X, WiLoR, and MediaPipe for body+hand pose estimation from a single video
  • Achieves 32% improvement in hand accuracy over prior methods using kinematic-chain wrist alignment
  • First release of high-quality SMPL-X annotations for 500 Saudi Sign Language signs (Ishara-500 dataset)

Why It Matters

Brings high-fidelity 3D avatar reconstruction to Arabic Sign Language, bridging a major accessibility gap for 400M+ speakers