(241b) Improving Robustness of Machine Learning Modeling of Nonlinear Processes Using Lipschitz-Constrained Neural Networks | AIChE

(241b) Improving Robustness of Machine Learning Modeling of Nonlinear Processes Using Lipschitz-Constrained Neural Networks

Authors 

Xiao, M. - Presenter, National University of Singapore
Tan Gian Yion, W., National University of Singapore
Neural networks (NNs) have become a popular method for modeling complex nonlinear processes in model predictive control. Despite their reported success, one pertinent issue that arises is that they could potentially be sensitive to small input disturbances, rendering them unsuitable for performance-critical applications. Additionally, adversarial input disturbances can also be generated to intentionally cause drastic changes in the NN output (Szegedy et al. (2013)). Another prominent issue in the development of NNs is over-fitting, where the networks perform well on the training data but fail to generalize to the entire domain of application. This is especially true in the presence of data noise, where the trained NN inadvertently learns the data noise (Ying (2019)). Other factors that might trigger over-fitting include insufficient data points and an excessively large hypothesis class in terms of large weights, number of neurons, and deep architectures. Current techniques to mitigate overfitting include regularization (Moore and DeNero (2011)), dropout (Baldi and Sadowski (2014)) and defining a predetermined predicate to terminate training (Baldi and Sadowski (2013)). However, these methods usually involve predefined hyperparameters and do not have provably correct guarantees on the training error itself.


To resolve these issues, we propose the use of Lipschitz-Constrained NNs (LCNNs) to model nonlinear processes. LCNNs have gained recent attention since they can reduce input sensitivity by maintaining a low Lipschitz constant and prevent overfitting by maintaining a low hypothesis complexity. In this work, we first prove a universal approximation theorem for LCNNs using SpectralDense layers (Serrurier et al. (2021)) to show that despite their lowered hypothesis complexity, they can approximate any 1-Lipschitz continuous function. Next, we develop a probabilistic bound on their generalization error by computing a size-dependent upper bound for their empirical Rademacher Complexity (ERC). Subsequently, we incorporate the LCNNs into the model predictive control (MPC) scheme, and a chemical process example is utilized to demonstrate that the LCNN-based MPC outperforms the MPC using conventional feedforward NNs in the presence of training data noise. Furthermore, due to the improved robustness of LCNNs, we also investigate the integration of LCNNs with autoencoders to improve the performance of model order reduction by effectively learning a low-dimensional representation of data embedded in a high-dimensional space even in the presence of data noise.

References

Baldi, P., Sadowski, P., 2014. The dropout learning algorithm. Artificial intelligence 210, 78–122

Baldi, P., Sadowski, P.J., 2013. Understanding dropout. Advances in neural information processing systems 26.

Moore, R.C., DeNero, J., 2011. L1 and L2 regularization for multiclass hinge loss models, in: Symposium on Machine Learning in Speech and Natural Language Processing

Serrurier, M., Mamalet, F., González-Sanz, A., Boissin, T., Loubes, J.M., Del Barrio, E., 2021. Achieving robustness in classification using optimal transport with hinge regularization, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 505–514

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R., 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199

Ying, X., 2019. An overview of overfitting and its solutions. Journal of Physics: Conference Series 1168, 022022