E$^2$DT: Efficient and Effective Decision Transformer with Experience-Aware Sampling for Robotic Manipulation
Robots learn 2x faster by cherry-picking the most valuable training moments.
In robotic manipulation, Decision Transformers (DTs) struggle with sample efficiency because they rely on uniform replay of experiences, lacking active exploration. The new E²DT framework, presented by researchers at ICRA 2026, solves this by letting the model shape its own training data. It uses a k-Determinantal Point Process (k-DPP) sampling mechanism that selects the most informative trajectory windows. Quality is measured via a composite metric combining return-to-go (RTG) quantiles, predictive uncertainty, and inverse-frequency stage coverage. Diversity is assessed through the DT's internal latent embeddings. This quality-diversity joint kernel ensures the robot learns from a balanced set of experiences—neither overfitting to common paths nor wasting time on irrelevant ones.
E²DT was evaluated on challenging manipulation benchmarks in both simulated environments and real-robot setups. Results consistently showed improvements over baseline DTs and prior RL methods, particularly in long-horizon tasks where exploration and sample efficiency are critical. The method avoids both local optima from excessive exploration and inefficient convergence from under-exploration. By coupling policy learning with intelligent experience selection, E²DT offers a principled path toward robust, data-efficient robotic learning—a key step for deploying manipulation skills in real-world settings like manufacturing and logistics.
- E²DT uses a k-Determinantal Point Process to actively select experience windows based on quality (RTG, uncertainty, coverage) and diversity (latent embeddings).
- The composite quality metric integrates return-to-go quantiles, predictive uncertainty, and inverse frequency for stage coverage.
- Accepted at ICRA 2026, E²DT outperforms prior methods on both simulation and real-robot manipulation benchmarks.
Why It Matters
More efficient robot learning means faster adaptation to new tasks, reducing training time and cost in industrial automation.