(470b) Application of Feature Engineering and Selection for Spectrometry-Based Soft Sensing
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Computing and Systems Technology Division
Data-Driven Techniques for Dynamic Modeling, Estimation and Control II
Tuesday, November 17, 2020 - 8:15am to 8:30am
In recent years, with increasing demand for tighter product quality monitoring and control, soft sensors have been widely applied in a variety of fields to predict critical variables with easy-to-measure secondary variables. In particular, near infrared (NIR) spectroscopy based soft sensors have found wide applications in many industries since the rapid and usually non-invasive spectroscopic reading of a sample is well suited for quantitative determination of its properties of interest. However, the predictive accuracy of soft sensor may be limited due to the following characteristics of the NIR spectra: (1) multicollinearity (i.e., high correlation among readings of neighboring wavelengths); (2) spectra noise; (3) curse of dimensionality (i.e., usually much fewer samples than number of variables). Variable selection plays an important role in the spectroscopy-based soft sensors because identification of informative variables helps establish a simple model with better prediction performance and easier interpretation. Although many successful methods have been reported, there are a couple of notable limitations. One is that the variables selected can by strongly influenced by spectra noise and disturbance (e.g., outliers or extreme points). In other words, there is a strong dependence of the selected variables on the choice of the training samples. The other notable limitations for many existing methods is that the selected wavelengths often show little connection to the chemical bounds or functional groups presenting in the sample. To address these limitations, a feature-based soft sensing approach was developed by employing the statistics pattern analysis (SPA) framework. In the SPA feature-based soft sensing, the whole spectrum is split into several segments and features that describe spectral characteristics are extracted from each segment. In other words, soft sensor is built by using features instead of the original spectral readings. The hypothesis is that the features not only better capture spectral characteristics such as nonlinearity and peak shift, but also reduce the influence of the spectra noises and disturbances, which would lead to improved predictive power of the soft sensor. On the other hand, not all features contribute equally to the predictive power of a soft sensor. For this reason, we recently developed a novel variable selection method, namely consistency enhanced evolution for variable selection (CEEVS). CEEVS increases soft sensor predictive accuracy through improving consistency of variable selection associated with chemical functional groups. In this work, we propose a novel soft sensor that integrates SPA (i.e., feature engineering) with CEEVS (i.e., feature selection), referred to as SPA-CEEVS soft sensor. We demonstrate that SPA-CEEVS soft sensor can achieve high predictive accuracy and is robust to the choice of the training samples. In addition, SPA-CEEVS can identify the segments associated with key chemical functional groups in a sample and the key features of the spectral segments that contribute the most to the predictive power of the soft sensor. The effectiveness of the SPA-CEEVS soft sensor is demonstrated through its application to four NIR spectroscopic data sets and compared with several other soft sensing approaches.