BEACON framework trains robots with 10x fewer real-world demos
Researchers blend simulation and real data to slash training costs for robot policies.
BEACON, by Zhang, Qi, and Yang, is a framework for training generative robot policies using abundant source and limited target demonstrations. It formulates cross-domain co-training as a discrepancy-aware importance-reweighting problem, jointly learning a diffusion-based visuomotor policy and per-sample source weights. The framework includes scalable instance-level discrepancy estimators, stochastic alternating updates, and a multi-source extension. In sim-to-sim, sim-to-real, and multi-source manipulation tasks, BEACON improves robustness and data efficiency over target-only, fixed-ratio co-training, and feature-alignment baselines. Notably, it achieves feature alignment as an implicit result of its discrepancy-aware co-training, without an explicit alignment objective.
- BEACON uses a diffusion-based visuomotor policy trained jointly with per-sample source weights to minimize target-domain generalization error.
- It demonstrates up to 50% improvement in data efficiency over fixed-ratio co-training and feature-alignment baselines across multiple manipulation domains.
- Feature alignment emerges implicitly from the discrepancy-aware reweighting, eliminating the need for explicit domain alignment objectives.
Why It Matters
BEACON slashes real-world data needs for robot training, accelerating safe and affordable deployment of adaptive robotic systems.