ZAPS-DA slashes robot jitter by 21x with zero-phase smoothing
14–21x less steering jitter, 0.7% failure rate — no post-processing needed.
Off-policy reinforcement learning models for continuous control often produce high-frequency action jitter that makes them unsuitable for physical robots. Traditional solutions—post-hoc filtering or embedding smoothness penalties into the actor’s loss—introduce phase lag or conflate reward optimization with smoothing, degrading performance. ZAPS-DA (Zero-Phase Action Policy Smoothing with Decoupled Actor) solves this by keeping the original main actor untouched and training a separate decoupled actor via supervised imitation of zero-phase filtered actions from the replay buffer. The decoupled actor acts as a feed-forward map from observation to smooth action, with no action-history input or inference-time filter — a mechanism the authors call “causal distillation of a non-causal filter.” A magnitude-matched MSE loss eliminates hyperparameter tuning across optimizers.
Validated with Soft Actor-Critic and a Savitzky–Golay filter on two driving simulators (MetaDrive and Webots) using paired n=150 evaluation protocols, the results are striking. On MetaDrive, steering jitter is reduced by 14–21x and throttle jitter by 3–5x (all p < 10⁻⁴, Bonferroni-corrected) while matching task-completion rates (p=0.28 success, p=0.31 crash) at a modest 6.3% reward cost. In the Webots adaptive cruise control environment, the same configuration yields a Pareto improvement: reward parity (p=0.121), 8–45x steering jitter reduction, and total task-failure rate cut from 2.0% to 0.7%. The paper is 7 pages, 5 figures, submitted to IEEE RA-L.
- Steering jitter reduced by 14–21x on MetaDrive and 8–45x on Webots adaptive cruise control
- No inference-time filter or action-history input; works with any off-policy RL algorithm (tested with Soft Actor-Critic)
- Failure rate dropped from 2.0% to 0.7% in Webots, with only 6.3% reward cost in MetaDrive
Why It Matters
Enables direct deployment of RL-trained policies on physical actuators, cutting jitter without sacrificing performance or adding latency.