(583c) Semisupervised Methodology for Fault Diagnosis in Chemical Plants
AIChE Annual Meeting
2009
2009 Annual Meeting
Computing and Systems Technology Division
Process Monitoring, Fault Detection and Diagnosis
Thursday, November 12, 2009 - 1:20pm to 1:45pm
Plant supervisory control systems require reliable management of multiple independent faults, which is crucial for supporting plant operators' decision-making. Since fault prediction has been addressed as one of the best way for preventing industry accidents, many kinds of data-based fault diagnosis and detection methods have been studied [1]. Fault diagnosis is a multi-class problem that can be addressed by classifying samples in only one class (the mono-label problem, ml) or in more than one (the multi-label problem, ML). The use of the ML approach allows building independent models for each fault, using independent classifiers [2]. This property allows representing the training information in a different way for each fault. Hence, it is also possible representing the training information in the most suitable way for each binary classifier, thus improving the general performance of the whole learning system. The Fault Diagnosis System (FDS) is implemented using Support Vector Machines (SVM) because of its proved efficiency dealing with ML problems in other areas. SVM is a kernel-based algorithm aimed at margin maximization. Its learning bias has proved to have good properties regarding generalization bounds and noise and outlier tolerance for the induced classifiers. Because of its mentioned properties, SVM is applied as a fault classification algorithm in this work inside a proper Fault Diagnosis Methodology. The target of the FDS is to maximize the mean F1 value, which is the diagnosis performance index that encompasses the precision and recall concepts and is widely accepted in Machine Learning Community [3]. Precision is defined as the conditioned probability of happening a fault f, conditioned to that fault f has been diagnosed, and recall is defined as the conditioned probability of the FDS predicting a fault f conditioned to that the sample be a fault f [2]. Hence: F1(f) = [2*Prec(f)*Rec(f)] / [Prec(f)+Rec(f)] A new methodology for the abnormal event management (AEM) or fault diagnosis in chemical plants has been proposed, consisting of 6 steps: Data set selection and representation which includes the data acquisition and data sets creation, information representation improvement, a first classification step applying ML&SVM approach to the training and validation sets, visualization and feature extraction step, clustering in new classes and finally, a second classification step applying the ML&SVM approach to the training and test sets with the redefined classes. This methodology is susceptible to an automated implementation excepting one step that could require human supervision and gathers powerful algorithms of clustering. Figure 1 shows a flowchart with all the stages of the methodology which it is applied to the Tennessee Eastman Process (TEP) benchmark [4] facing with all its 20 faults obtaining the diagnosis of each one of them, including the complicated Fault 3, 9 and 15 which had not been diagnosed before. Some data-based fault diagnosis approaches applied to the TEP and found in the literature only consider the diagnosis of some faults and they use a sort of monitoring or detection metrics such as: missing alarm rate, detection delay [5], diagnosis success rate, fault recognition accuracy, misclassification percentage [6,7], detection rates and monitoring indexes such as T2 and Q [7,8,9] and not a strict fault diagnosis performance index comparable among techniques. Other works take into account the 20 faults reported in the TEP but they can not neither monitor nor diagnose faults 3, 9 and 15 reporting them as unobservable [10,11]. This work approaches a data-based AEM methodology in order to diagnose all kind of faults presented in chemical processes. Two data arrangement or information improvement substeps are done in this methodology. First, a feature extension is done adding the standard deviations of the process variables as features. The second treatment consists in the squared standardization of the classes respect to class 0 which represents the process in normal or standard conditions. A powerful multivariate statistical technique such as Independent Component Analysis (ICA) is applied to classes (b) which are below an assigned diagnosis performance threshold TH as clustering technique in order to visualize faults as clusters and extract features that are represented by the independent components. Gaussian Mixture Models (GMM) with Bayesian Information Criterion (BIC) takes advantage of these features so as to cluster samples of the treated faults that have low diagnosis performance in new classes. Finally, SVM is applied to the modified and extended training set (R*) with the new classes produced after applying the methodology (¦Ä) and the rest of faults which were well-diagnosed after the first classification step (¦Ë) for obtaining the classifiers models and these are validated over the modified and extended test set (T*) in order to obtain the diagnosis matrix (D) which will have the same number of columns as the number of new classes (¦"). Moreover, contingency matrix will be created from the comparison between D and H (happening matrix) in order to obtain the diagnosis performance of each fault ¦". Faults 3, 9 and 15 have revealed very hard to diagnose by previous works. With the proposed methodology, first results show that Fault 3 and 15 are diagnosed with a 70% performance; meanwhile fault 9 is diagnosed with a 43% performance. The rest of faults that were well-diagnosed after the information representation improvement were also well-diagnosed at the end of the methodology, except Fault 16. Therefore, results obtained prove the successful of this methodology, which is revealed very promising for practical applications. Acknowledgements Financial support from Generalitat de Catalunya through the FI fellowship program is fully appreciated. Support from the Spanish Ministry of Education through project no. DPI 2006-05673 is also acknowledged. References 1.Venkatasubramanian V, Rengaswamy R, Yin K, Kavuri SN. A review of process fault detection and diagnosis: Part I. Quantitative model-based methods. Computers and Chemical Engineering. 2003; 27: 293-312. 2.Y¨¦lamos I., Graells M., Puigjaner L, Escudero G. (2007). Simultaneous fault diagnosis in chemical plants using a MultiLabel approach. AIChE Journal. 2007; 53, 11: 2871-2884. 3.Manning C, Sch¨¹tze H. Foundations of Statistical Natural Language Processing. The MIT Press, 1999. 4.Downs JJ, Vogel EF. A plant wide industrial process control problem, Computers and Chemical Engineering. 1992; 17, 3: 245-255. 5.Detroja KP, Gudi RD, Patwardhan SC. Plant-wide detection and diagnosis using correspondence analysis, Control Engineering Practice, 2007; 15, 12: 1468-1483. 6.Kulkarni A, Jayaraman VK, Kulkarni BD. Knowledge incorporated support vector machines to detect faults in Tennessee Eastman Process, 2005; 29: 2128-2133. 7.Chiang L, Kotanchek M, Kordon A. Fault diagnosis based on Fisher discriminant analysis and support vector machines. Computers and Chemical Engineering, 2004; 28: 1389-1401. 8.Kano M, Nagao M, Hasebe S, Hashimoto I, Ohno H, Strauss R, Bakshi B. Comparison of multivariate statistical process monitoring methods with applications to the Eastman challenge problem. Computers and Chemical Engineering, 2002; 26: 161-174. 9.Ge Z, Yang C, Song Z. Improved kernel PCA-based monitoring approach for nonlinear processes. Chemical Engineering Science, 2009; 64: 2245-2255. 10.Zhang Y. Enhanced statistical analysis of nonlinear processes using KPCA, KICA and SVM. Chemical Engineering Science, 2009; 64: 801-811. 11.Lee J, Yoo C, Lee I. Statistical monitoring of dynamic processes based on dynamic independent component analysis. Chemical Engineering Science, 2004; 59: 2995-3006.