(432c) Lstm Neural Networks and Nonlinear State Space Model Identification | AIChE

(432c) Lstm Neural Networks and Nonlinear State Space Model Identification

Authors 

Qin, S. J. - Presenter, City University of Hong Kong
Li, J., City University of Hong Kong
In recent years, long short-term memory (LSTM) neural networks have achieved successful results in speech recognition [1] and handwriting recognition [2]. In the industrial processes, there has been a proliferation of LSTM-based models that have been employed in dynamic inferential sensor modeling to monitor and control the complete dynamic industrial process. For example, [3] proposed a supervised LSTM network that used the input and quality variables to learn hidden dynamics for inferential sensor modeling. [4] employed an LSTM method to predict NOx emission at the furnace exit of a 660MW pulverized coal-fired utility boiler. However, they bring LSTM to the inferential sensor field without comparing or fairly comparing the traditional methods in the original field of inferential sensors, such as some methods based on statistical learning. For instance, it is unreasonable for [3] to apply the proposed LSTM-based method to a penicillin fermentation process and an industrial debutanizer column by only comparing to Recurrent neural network (RNN) instead of comparing with other traditional statistical methods. [4] compared the performance of LSTM and support vector machine (SVM) on NOx emission prediction, the results indicate that the LSTM outperforms the SVM. It is necessary to consider dynamic modeling because this process of NOx emission is under both steady and transition conditions.

In essence, to show whether it is necessary and suitable to adopt LSTM for dynamic modeling, we have to make reasonable comparisons with other traditional methods. Moreover, in industrial process monitoring, the industry tends to accept interpretable models based on safety, reliability, and cost considerations. However, much research work in this field focus on the application and method improvement of LSTM-based models rather than model interpretability. For example, [4] only discussed the establishment of LSTM-based model and model performance of NOx emission prediction but did not reasonably explain what the LSTM-based model has learned from the process and how to model the system. Some articles that also focus on the application and method improvement of the LSTM model without properly explaining how LSTM realizes process monitoring and control can be found in literature [3], [5], [6], [7], [8]. What dynamics does LSTM learn, with its sophisticated structure, from a dynamic industrial process? Are the complex gates in LSTM necessary and appropriate for dynamic process modeling? These research questions have brought about necessity to dissect LSTM for process monitoring and control.

In this paper, we study the LSTM gate structure in the context of nonlinear state space modeling. First, we formulate the LSTM-based network for prediction in the state space form, the structure is visualized in Figure 1. Then we fairly compare LSTM with subspace identification methods (SIM) and traditional methods including partial least squares (PLS) [9] [10], support vector regression (SVR) [11], and the Least Absolute Shrinkage and Selection Operator (Lasso) [12], [13] for dynamic inferential modeling and tested on the industrial 660MW boiler dataset and an industrial debutanizer column for their effectiveness. The results of SIMs will be experimented and analyzed later. Other results are summarized in Figure 2. Results of both case studies show that SVR, PLS, Lasso with incorporated dynamics all demonstrate better prediction accuracy than LSTM. In addition, we found that inferential models based on statistical methods in this study are better than some LSTM based models with the same dataset in some studies [3], [4]. By implementing an LSTM of the simplest structure with a single hidden node and without the input and output gates, we discover that LSTM performance has been improved compared to the fully optimized LSTM with the input and output gates and many hidden notes. The LSTM performance is sustained by further turning off the forget gate. These experiments show that the gates and activation functions do not improve the performance as promised. The experiment results are summarized in Figure 3.

In addition to LSTM sub-unit dissection, we further investigate what LSTM has learned by visualizing the iteration process of hidden state inside the LSTM model established in the NOx case. The internal representations are show in Figure 4. Results reflect that the process dynamics of the boiler process which LSTM has captured matches with physical process knowledge. This indicates that LSTM is suitable for dynamic inferential modeling of complex industrial processes. However, the complexity in parameter tuning and lack of transparent interpretation of LSTM remain extra concerns, which could limit its implementation in practice. On the contrary, statistical learning methods such as PLS and Lasso exhibit not only excellent prediction accuracy, but also strong model interpretability. Furthermore, the study shows that sophisticated/deep LSTM networks with stochastic gradient based training algorithms do not necessarily recover linear models with finite and noisy training data. Therefore, the necessity of dynamic inferential modeling with LSTM should be reconsidered.

References

[1] Graves, A., Jaitly, N., & Mohamed, A. R. (2013, December). Hybrid speech recognition with deep bidirectional LSTM. In 2013 IEEE workshop on automatic speech recognition and understanding (pp. 273-278). IEEE.

[2] Carbune, V., Gonnet, P., Deselaers, T., Rowley, H. A., Daryin, A., Calvo, M., ... & Gervais, P. (2020). Fast multi-language LSTM-based online handwriting recognition. International Journal on Document Analysis and Recognition (IJDAR), 23(2), 89-102.

[3] Yuan, X., Li, L., & Wang, Y. (2019). Nonlinear dynamic soft sensor modeling with supervised long short-term memory network. IEEE transactions on industrial informatics, 16(5), 3168-3176.

[4] Tan, P., He, B., Zhang, C., Rao, D., Li, S., Fang, Q., & Chen, G. (2019). Dynamic modeling of NOX emission in a 660 MW coal-fired boiler with long short-term memory. Energy, 176, 429-436.

[5] Ke, W., Huang, D., Yang, F., & Jiang, Y. (2017, November). Soft sensor development and applications based on LSTM in deep neural networks. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1-6). IEEE.

[6] Ren, J., & Ni, D. (2020). A batch-wise LSTM-encoder decoder network for batch process monitoring. Chemical Engineering Research and Design, 164, 102-112.

[7] Yuan, X., Li, L., Shardt, Y. A., Wang, Y., & Yang, C. (2020). Deep learning with spatiotemporal attention-based LSTM for industrial soft sensor model development. IEEE Transactions on Industrial Electronics, 68(5), 4404-4414.

[8] Lui, C. F., Liu, Y., & Xie, M. (2022). A Supervised Bidirectional Long Short-Term Memory Network for Data-Driven Dynamic Soft Sensor Modeling. IEEE Transactions on Instrumentation and Measurement, 71, 1-13.

[9] Geladi, P., & Kowalski, B. R. (1986). Partial least-squares regression: a tutorial. Analytica chimica acta, 185, 1-17.

[10] Sun, L., Ji, S., & Ye, J. (2013). Multi-label dimensionality reduction. CRC Press.

[11] Drucker, H., Burges, C. J., Kaufman, L., Smola, A., & Vapnik, V. (1996). Support vector regression machines. Advances in neural information processing systems, 9.

[12] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288.

[13] Qin, S. J., Guo, S., Li, Z., Chiang, L. H., Castillo, I., Braun, B., & Wang, Z. (2021). Integration of process knowledge and statistical learning for the Dow data challenge problem. Computers & Chemical Engineering, 153, 107451.