(149x) Hybrid Series/Parallel All-Nonlinear Dynamic-Static Stochastic Neural Networks: Development, Training and Application to Chemical Processes | AIChE

(149x) Hybrid Series/Parallel All-Nonlinear Dynamic-Static Stochastic Neural Networks: Development, Training and Application to Chemical Processes

Authors 

Mukherjee, A. - Presenter, West Virginia University
Bhattacharyya, D., West Virginia University
Developing accurate first-principles models for complex nonlinear stochastic dynamic systems can be time consuming, computationally expensive, and may be infeasible for certain systems due to lack of knowledge. It is also challenging to adapt first-principles models for time-varying probabilistic process systems. Data-driven or black-box models are relatively easier to develop, simulate and adapt online. However, for many nonlinear dynamic stochastic systems, it can be significantly difficult to accurately model them by a deterministic simple (i.e., containing just one or very few hidden layers) static or dynamic neural network. If a large number of hidden layers are used, it can lead to potential overfitting during model training especially in absence of large data sets1. The typical data-driven system identification approaches for nonlinear dynamic processes models with different uncertainty characteristics comprise the conventional deep neural networks like the Long Short-Term Memory (LSTM)2 or Gated Recurrent Unit (GRU)3 types of recurrent neural networks, the Gaussian radial basis function (RBF) kernel for the primal-dual formulation of least squares support vector machines4,5 (LS-SVMs), as well as the Hammerstein and Wiener models6 represented by a linear time invariant (LTI) dynamic combined with a nonlinear static network. These models can lead to high computational expense or can fail to accurately capture the nonlinearities in the process dynamics as well as the uncertainties in parameter estimates. Therefore, it is desired to consider hybrid static-dynamic models where both static and dynamic models are simple, and nonlinear. Furthermore, for modeling stochastic systems, it is desired that probabilistic data-driven models be developed. However, the presence of nonlinearities in both the static and dynamic networks makes it considerably difficult to synthesize optimal hybrid network and estimate the network parameters especially when probabilistic networks are considered. This work proposes the development of hybrid stochastic series and parallel all-nonlinear static-dynamic neural networks. Efficient sequential training algorithms7 have also been developed to provide flexibility in the optimal network synthesis and parameter estimation.

Conventional backpropagation algorithms for training deterministic/stochastic static and dynamic neural networks use first order methods, but these methods may require significant tuning of hyper parameters or suffer from slow convergence rates. On the contrary, second order methods can address some of these issues but can be subjected to excessive computational expense due to Hessian calculation, may be limited in terms of candidate architectures, and may only be used for estimating parameters during training of small to medium sized networks. Therefore, applying the second-order methods for the hybrid fully nonlinear static-dynamic networks in a monolithic approach can lead to higher computational costs. Furthermore, classical Gaussian RBF with fixed centers and widths may suffer from the curse of dimensionality for modeling higher order systems with larger input space and may be extremely sensitive to noisy data. It may also happen that the most efficient optimization algorithm and its parameters for converging the static network model may be different than that for converging the dynamic network model. This work7 focuses on developing sequential parameter estimation algorithms for optimal synthesis of the hybrid stochastic neural network models, where the static and dynamic networks can be trained independently by different optimization algorithms, while solving an outer layer of optimization for estimating the connection weights between the static and dynamic models. Gaussian RBF models with stochastic updates of centers and widths as well as Bayesian Neural Network (BNN) algorithms have been used for learning the optimal parameters of the probabilistic models. Both series and parallel types of architecture have been considered to develop flexible network models that offer tradeoff between computational expense and prediction accuracy for highly nonlinear systems with uncertainties in training data and are flexible for incorporating modifications in network architecture.

The proposed algorithms are applied to train the hybrid networks for three nonlinear dynamic processes with different noise characteristics – a pH neutralization reactor, the Van de Vusse reactor, and a pilot plant for post-combustion CO2 capture using the monoethanolamine solvent8. It has been observed that the hybrid series and parallel all-nonlinear stochastic static-dynamic models show superior performance compared to the existing state-of-the-art network models (LSTM, GRU, etc.) as well as the LS-SVM approaches, especially for the CO2 capture system. In summary, the proposed network structures and training algorithms show promise for solving large-sized nonlinear dynamic stochastic network problems.

References

  1. Bejani, M. M. & Ghatee, M. A systematic review on overfitting control in shallow and deep neural networks. Artificial Intelligence Review vol. 54 (Springer Netherlands, 2021).
  2. Yu, Y., Si, X., Hu, C. & Zhang, J. A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Comput. 31, 1235–1270 (2019).
  3. Ren, Y. M. et al. A tutorial review of neural network modeling approaches for model predictive control. Comput. Chem. Eng. 165, 107956 (2022).
  4. Goethals, I., Pelckmans, K., Suykens, J. A. K. & De Moor, B. Identification of MIMO Hammerstein models using least squares support vector machines. Automatica 41, 1263–1272 (2005).
  5. Falck, T. et al. Least-Squares Support Vector Machines for the identification of Wiener-Hammerstein systems. Control Eng. Pract. 20, 1165–1174 (2012).
  6. Biagiola, S. I. & Figueroa, J. L. Identification of uncertain MIMO Wiener and Hammerstein models. Comput. Chem. Eng. 35, 2867–2875 (2011).
  7. Mukherjee, A. & Bhattacharyya, D. Hybrid Series/Parallel All-Nonlinear Dynamic-Static Neural Networks: Development, Training, and Application to Chemical Processes. Ind. Eng. Chem. Res. 62, 3221–3237 (2023).
  8. Chinen, A. S., Morgan, J. C., Omell, B., Bhattacharyya, D. & Miller, D. C. Dynamic Data Reconciliation and Validation of a Dynamic Model for Solvent-Based CO2 Capture Using Pilot-Plant Data. Ind. Eng. Chem. Res. 58, 1978–1993 (2019).