Robotics

RLDX-1 Technical Report

New robotic policy triples success on dexterous tasks vs top competitors

Deep Dive

Researchers have introduced RLDX-1, a general-purpose robotic policy for dexterous manipulation built on the Multi-Stream Action Transformer (MSAT) architecture. MSAT unifies heterogeneous modalities — vision, language, motion, memory, and physical sensing — through modality-specific streams with cross-modal joint self-attention. The system also incorporates system-level innovations: synthesizing training data for rare manipulation scenarios, learning procedures specialized for human-like movement, and inference optimizations for real-time deployment. This combination allows RLDX-1 to handle complex, contact-rich tasks that require broader functional capabilities beyond typical vision-language-action models.

In empirical evaluations, RLDX-1 consistently outperformed recent frontier VLAs like π0.5 and GR00T N1.6 across both simulation benchmarks and real-world tasks. A standout result came from ALLEX humanoid tasks, where RLDX-1 achieved 86.8% success rates compared to roughly 40% for its competitors — a more than 2x improvement. These results position RLDX-1 as a promising step toward reliable robotic policies for real-world dexterous manipulation that demands adaptability and real-time control.

Key Points
  • RLDX-1 uses the Multi-Stream Action Transformer (MSAT) with modality-specific streams and cross-modal joint self-attention
  • Achieves 86.8% success on ALLEX humanoid tasks, vastly outperforming π0.5 and GR00T N1.6 (~40%)
  • Integrates motion awareness, memory-aware decision making, and physical sensing for complex real-world tasks

Why It Matters

RLDX-1 bridges the gap between generalist VLAs and real-world dexterous manipulation, enabling reliable robot control in dynamic environments.