(370o) New Evaluation Method of Soft Sensors Considering Characteristics of Time Series Data | AIChE

(370o) New Evaluation Method of Soft Sensors Considering Characteristics of Time Series Data

Authors 

Kojima, T. - Presenter, Meiji University
Kaneko, H., Meiji university
Introduction
In chemical plants, process variables such as temperature, pressure and concentration are managed and controlled for the purpose of quality control of products, improvement of production efficiency and management of abnormalities in plants. Process variables can be divided into variables X that are easy to be measured in real time and frequently, and variables Y that are difficult to be measured. Therefore, soft sensors are widely used to estimate the value of Y online. The soft sensor is a numerical model constructed between X and Y with past measurement data. The value of Y can be estimated in real time by inputting the value of X measured in real time to the model. An error exists for Y. Cross validation (CV) method is one of methods to evaluate errors or the performance of soft sensors. However, the CV method has the following problems.
● It takes a long time to evaluate models if there are many samples
● If the number of samples is low, the number of samples used for model construction in the CV method will be lower and the model will become unstable
● It can’t evaluate an already built model
We aim to develop a new evaluation method that can solve these problems.

Proposed method
When selecting hyperparameters for model construction, a dataset is divided into training data and validation data. The model is constructed using only training data, and the accuracy of the model is confirmed using the verification data. However, depending on the number of original data, the constructed model tends to be unstable. In the proposed method, model construction is performed using all datasets. We propose middle points between training data and use them as temporary data for model validation. The proposed method enables model construction and evaluation using all data at one time. Since it is not necessary to repeat model construction as in the CV method, reduction of time required for evaluation is expected.

Results and Discussion
To verify the effectiveness of the proposed method, the simulation data of the plant Tennessee Eastman process (TEP) [1] and the two actual industrial process data sets Sulfur Recovery Unit (SRU) [2] and debutanizer column [3] were analyzed. The CV method and the proposed method were compared using support vector regression (SVR) [4] and recurrent neural network (RNN) [5] as regression analysis methods.

Tennessee Eastman Process (TEP) [3]
The effectiveness of the proposed method was verified using the simulation data of TEP. The regression method is SVR. The required hyperparameters for SVR were selected using CV method and the proposed method respectively, and the estimation performance and evaluation time of the constructed soft sensor were compared. The objective variable is the concentration of by-products. The explanatory variables used were 22 variables such as temperature and pressure.
10 samples in the dataset were used as training data for the purpose of evaluating the model created with a small number of data. In addition, there is randomness in the selection of training data, and the evaluation result of the model may be different at each analysis. Therefore, analysis was performed 10 times while changing data, and the average value of the obtained evaluation index was compared. The test data used 980 data from the data of TEP.
R2 was 0.430 for the CV method, and 0.508 for the proposed method. The time taken for the evaluation was 44.4 seconds in the CV method and 17.3 seconds in the proposed method. As a result, when the proposed method is used, the evaluation accuracy is improved, and the evaluation time is successfully shortened.

Sulfur Recovery Unit (SRU) [2]
The effectiveness of the proposed method was verified using the operation data obtained from the operation of SRU. The objective variable is the H2S concentration in the tail gas of line 4, The explanatory variables used were: gas flow MEA GAS, air flow AIR MEA, secondary air flow AIR MEA 2, gas flow in the SWS zone, and air flow in the SWS zone.
RNN, which is effective when the number of data is large, is used as the regression method. The selection of hyperparameters necessary for constructing RNN was performed using the proposed method. The number of times of learning was verified by using “early stopping” to determine whether learning can be stopped at an appropriate timing. The other hyperparameters were judged by comparing the evaluation accuracy of the soft sensor constructed by each of the cases with and without the proposed method.
The first two thirds of the data set were used as training data, and another one fifth was used when creating the validation data. In addition, the remaining one third of the initial data number was used as test data. Therefore, there were 6734 training data, 1374 validation data, and 3327 test data. Moreover, 6733 middle point data of all points in learning data were used as data for verification of the proposed method. As a result, we were able to select hyperparameters that could reduce the evaluation time while maintaining the model performance.

Debutanizer column [3]
The effectiveness of the proposed method was verified using the operating data obtained from the debutanizer operation. The target variable is the content of butane in the bottom stream. The explanatory variables used are: top temperature, top pressure, reflux flow, flow to next step, 6th tray temperature and bottom temperature.
The way to discuss the performance of the proposed method is the same as the way in SRU. There were 1266 training data, 317 validation data, and 791 test data. In addition, 1265 midpoint data of all points in the training data were used as data for verification of the proposed method. As a result, we were able to select hyperparameters that could reduce the evaluation time while maintaining the model performance.


Conclusions
The middle point between variables of time series data was constructed as validation data, and the evaluation method to evaluate the model was proposed. To verify the effectiveness of the proposed method, a case study was conducted using simulation data of TEP when the number of data is small, and operation data measured in SRU and data measured in debutanizer when there are many samples. As a result, regardless of the number of data, it was possible to select an appropriate hyperparameter and shorten the evaluation time. By using this proposed method, it is considered that stable and quick evaluation can be made possible by a soft sensor.

Reference
[1] LH Chiang, EL Russell, RD Braatz. Fault Detection and Diagnosis in Industrial Systems. Springer. 2001;103-112.
[2] http://www.springer.com/us/book/9781846284793.
[3] http://www.springer.com/us/book/9781846284793.
[4] Saneej B, Chitralekha, Sirish L Shah. Application of support vector regression for developing soft sensors for nonlinear processes. CSCHE. 2007;88:696-709.
[5] Seunghyun Han, Taekyoung Kim, Dooyoung Kim, Yong-Lae Park, Sungho Jo. Use of Deep Learning for Characterization of Microfluidic Soft Sensors. IEEE ROBO. 2018;3:873-880.