(333f) Process Knowledge Discovery and Selecting Number of Non-Zero Loadings in Sparse Principal Component Analysis
AIChE Annual Meeting
2016
2016 AIChE Annual Meeting
Computing and Systems Technology Division
Big Data Analytics in Chemical Engineering
Tuesday, November 15, 2016 - 1:45pm to 2:00pm
Process
knowledge discovery and selecting number of non-zero loadings in sparse
principal component analysis
Shriram
Gajjar*, Murat Kulahci**, Ahmet Palazoglu*
*University
of California, Davis, CA 95616, USA
(Tel:
530-752-8774; e-mail: anpalazoglu@ucdavis.edu).
**Technical
University of Denmark, Lyngby, Denmark and Luleå University of Technology,
Luleå, Sweden
(e-mail:muku@dtu.dk)
Background: Smart production technologies that are
implemented today have dramatically intensified data generation and collection
through networked information-based technologies throughout the chemical
industry and other manufacturing enterprises. The data generation and
collection are so fast-paced that humans have to rely on computers for
consuming as well as processing the data. This, in turn, leads to an ever
increasing pace for the development of algorithms and methods to improve
process performance and facilitate process monitoring. The algorithms and
methods should, at first, be able to extract significant information from the
large datasets. Second, they should provide accurate means to reduce process
variability and boost performance. Third, they should allow discovery of the
underlying process dynamics that can substantially improve decision-making.
Finally, steps can then be taken to move towards recommending preemptive
actions (preventive decisions that are made before a failure occurs or is even
observed).
Prior Work: Researchers have used principal
component analysis (PCA) to capture meaningful information in a reduced
dimensional space. PCA-based monitoring methods are among the most
widely used multivariate statistical methods (Cinar et al., 2007). Using PCA for dimension reduction has one specific drawback where each principal
component (PC) is a linear combination of all m variables and the
loadings are typically nonzero. Such nonzero loadings (NZL) make it difficult
to interpret the derived PCs and may confound subsequent analyses. To address
this challenge, Zou et al. (2006) proposed sparse principal
component analysis (SPCA) in which sparse loadings
are obtained by imposing the lasso (elastic net) constraints on the
coefficients (i.e., loadings) of the PCA model. SPCA essentially is the result
of an optimization of the trade-off between variance captured by PCs and the
sparsity imposed on PCs. It allows the user to control the sparsity of the
loadings and improve the ability to identify the important variables.
Preliminary results: One of the challenges in using SPCA is in deciding the penalty
parameters or choosing the number of non-zero variables/loadings (NNZL). We
propose three approaches viz. exhaustive selection, forwards selection
and sensitivity analysis that simplify the process of selecting penalty
parameters and provide a more intuitive solution for understanding the physical
meaning of variables monitored in chemical processes. In the exhaustive search
approach one goes through all possible combinations of NZL in each PC, then
chooses a solution that meets the required criteria. The downside of this
approach is that it is computationally intensive and, in scenarios with large number
of NZL, it is impractical and even infeasible to go through all combinations.
In the forward selection approach we impose constraints on each SPC and a lower
limit on the total variance captured by the sparse principal components (SPCs)
is also imposed. By doing so, the search space for the optimum number of NNZL
for each SPC is drastically reduced. Sensitivity analysis is a systematic
review of the NNZL on SPCs. In this approach, the NNZL on a PC is varied
keeping all other aspects constant. The goal is to determine if the NNZL on a
PC can be made sparser without losing information. Thus, the traditional PCA
can be altered in such a way that the obtained loadings would have a clear interpretation
without significant loss of information extracted in each PC in terms of
explained variance. Such an approach would also assist in the
application of PCA in process surveillance as better understanding of the
impact of PC loadings can clearly facilitate process monitoring, i.e., fault
detection and diagnosis. Furthermore, we discuss the advantages of SPCA for process
knowledge discovery with a synthetic example and the Tennessee Eastman
benchmark process. The paper will highlight the substantially improved
performance of process fault detection and diagnosis strategies using SPCA when
compared with traditional approaches.
References
Cinar,
A., Palazoglu, A. and Kayihan, F. (2007) 'Multivariate Statistical Monitoring
Techniques', Chemical Process Performance Evaluation Chemical Industries: CRC Press,
pp. 37-71.
Zou,
H., Hastie, T. and Tibshirani, R. (2006) 'Sparse Principal Component Analysis',
Journal of Computational and Graphical
Statistics, 15(2), pp. 265-286.