(688b) Development of a New Pattern Recognition Method and Its Application to Just-In-Time Modeling
AIChE Annual Meeting
2008
2008 Annual Meeting
Computing and Systems Technology Division
Data Analysis: Design, Algorithms & Applications
Thursday, November 20, 2008 - 3:36pm to 3:57pm
Linear regression is a simple and useful method to build process models for process monitoring and state estimation. However, linear models do not always function well in practice due to not only process nonlinearity but changes in process characteristics. Such changes are caused by catalyst deactivation and scale adhesion in the chemical industry or by equipment maintenance in the semiconductor industry, for example. To build good process models that can cope with the changes in process characteristics, unsupervised pattern recognition is a key methodology. A conventional unsupervised pattern recognition method is the k-nearest neighbor (k-NN) method, which classifies samples into several classes on the basis of the distance between samples. However, it is crucial for classifiers to take into account the correlation among variables to improve the model performance especially when process characteristics change, because such changes can be captured efficiently by the correlation instead of the distance. Self-organizing map (SOM) is an unsupervised learning algorithm that can construct clusters based on the sample similarity and visualize high dimensional data, but it is difficult to determine an appropriate map size and it requires a heavy computational load.
In the present work, a new pattern recognition method based on geometry of samples in a linear space is proposed. The proposed method can select samples whose correlations are similar to the query point without supervised information or other prior information. The proposed procedures are as follows:
1) Subtract the query point from all the other samples.
2) Calculate the correlation coefficient between all pairs of arbitrary two subtracted samples, and the pairs whose correlation coefficients are close to -1 are selected.
3) New vectors are created by subtracting one vector of the selected pairs from another.
4) Derive the subspace containing the query point from the created vectors by using principal component analysis (PCA).
5) The Q statistics between all samples and the derived subspace are calculated, and the samples having small Q statistics are selected as the similar samples to the query point.
In step 4), robust PCA (RPCA) can be used instead of ordinary PCA to cope with outliers, since the principal components derived by PCA are known to be very sensitive to outliers.
In addition, a new soft-sensor design method integrating the proposed pattern recognition method and Just-In-Time (JIT) modeling is proposed. The proposed method is referred to as Correlation-based JIT (C-JIT) modeling. The estimation performance of JIT modeling deteriorates when process characteristics change, because it selects the samples for local modeling based only on the distance between samples and the query point. By integrating JIT modeling and the proposed pattern recognition method, C-JIT modeling can select samples whose correlation are similar to each other and cope with the changes in process characteristics. The usefulness of the proposed pattern recognition method and C-JIT modeling is demonstrated through a case study of soft-sensor design for CSTR process. The proposed method can improve the estimation accuracy by 48% in comparison with conventional JIT modeling.