MHE-based training for ReLU networks guarantees convergence with observability
A control-theoretic approach ensures locally observable weights via persistent excitation.
In a paper submitted to arXiv on May 27, 2026, researchers Yi Yang, Victor G. Lopez, and Matthias A. Müller (Leibniz University Hannover) introduce a moving horizon estimation (MHE) approach for training feedforward neural networks (FNNs) with rectified linear unit (ReLU) activations. Their method frames the network as a discrete-time dynamical system where the weights are unknown states, enabling rigorous convergence analysis from a control-theoretic perspective. The team first examines local observability of this system. For two-layer FNNs with fixed output weights, they derive a sufficient condition—based on the observability rank condition—that ensures the state (weights) can be locally distinguished from neighboring states. A key contribution is a persistently exciting (PE) input design strategy that guarantees the system is locally observable, meaning the weight estimates converge to their true values under the MHE framework.
Interestingly, the paper shows that multi-layer FNNs with ReLU activations generally fail the observability rank condition, making convergence guarantees harder. The authors address this by restricting training to updating only the projection of the state onto the observable subspace using a fixed-length window of input-output data. This choice allows them to still prove convergence of the MHE-based training, unlike standard gradient-based methods that lack theoretical guarantees. The approach is validated with numerical examples showing accurate weight recovery. The work bridges control theory and deep learning, offering a principled alternative for safety-critical applications where guaranteed convergence is essential—such as in autonomous systems or industrial control.
- FNN with ReLU activations reformulated as a dynamical system with weights as unknown states, enabling control-theoretic analysis.
- Sufficient observability rank condition derived for two-layer FNNs; multi-layer networks generally fail this condition.
- Persistently exciting input design guarantees local observability and convergence of the MHE-based training algorithm.
Why It Matters
Control-theoretic training offers guaranteed weight convergence, a key advantage for safety-critical AI systems.