(287f) Dynamic Data Feature Engineering for Process Operation Troubleshooting
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Topical Conference: Next-Gen Manufacturing
Big Data and Data Analytics
Wednesday, November 18, 2020 - 9:15am to 9:30am
Most process data are collected in the form of time series, which are highly cross-correlated and auto-correlated. In other words, the high-dimensional data are not dynamically excited in all dimensions of the measurement space. This situation is more pronounced with IIoT where sensors are installed with a degree of redundancy which leads to collinearity. Due to collinearity, the dynamic variations in plant data are often concentrated in a low dimensional subspace. On the other hand, the complementary subspace has only random variations which are independent over time. Because of these characteristics, traditional modeling tools such as vector autoregressive moving average (VARMA) analysis are not suitable since they assume full dimensional dynamics (Tsay, 2013).
In this paper we apply the dynamic inner canonical correlation analysis (DiCCA) developed in Dong and Qin (2018) and Dong et al. (2020) to extract low dimensional latent variables. Each of the latent variable models is a self-dependent univariate autoregressive (AR) model. The latent variables are orthogonal or contemporaneously independent of each other, which is convenient for visualizing latent features and troubleshooting abnormal variations in high dimensional data. In addition, each of the latent variables is rank-ordered by the predictability from its own history. This objective promotes self-dependent AR relations to be extracted. For example, integrating components and oscillatory components are favored by this objective.
In this paper, we propose a dynamically engineered latent feature analysis (DELFA) procedure for plant-wide troubleshooting by applying the DiCCA algorithm to decompose high dimensional process data into dynamic latent features. DELFA does not make use of the prediction model of DiCCA. It finds dynamic features of a segment of time series data that contain interesting features, which could be associated with anomalies. We also extend the DiCCA algorithm to deal with exogenous variables, which is referred to the DiCCAX algorithm.
DELFA further identifies measured variables that are best interpreted by the latent features.
The degree of interpretation is represented by the latent variable loadings. Composite loadings and weights are derived to analyze features that appear in multiple latent variables.
The features of interest can be intermittent in time; when they happen their loadings on measured variables are the focus of analysis. We demonstrate the effectiveness of the DELFA troubleshooting procedure on two high dimensional datasets from an industrial plant. One dataset is analyzed with the troubleshooting procedure to find several anomalous features. The other data set, collected after fixing a major anomaly, is analyzed to confirm the fix of the major anomaly and also find other anomalies.
References
Ruey S. Tsay. Multivariate Time Series Analysis: with R and financial applications. John Wiley & Sons, 2013.
Yining Dong and S Joe Qin. Dynamic latent variable analytics for process operations and control. Computers & Chemical Engineering, 114:69-80, 2018.
Yining Dong, Yingxiang Liu, and S Joe Qin. Efficient dynamic latent variable analysis for high dimensional time series data. IEEE Trans. on Industrial Informatics, vol. 16, no. 6, pp. 4068-4076, June 2020.