(202k) Large-Scale Statistical Process Control: Should We Use Partial Or Marginal Correlations? | AIChE

(202k) Large-Scale Statistical Process Control: Should We Use Partial Or Marginal Correlations?

Authors 

Reis, M. - Presenter, University of Coimbra
Rato, T., University of Coimbra



A variety of multivariate statistical process control (MSPC) methods, namely control charts, have been developed and applied in order to determine whether processes are only subject to common causes of variability or if a special or assignable cause, related with some abnormality inside or outside the process, has occurred. To accomplish such detection, a variety of monitoring procedures have been proposed from the univariate Shewhart chart and its extension to the multivariate case, the Hotelling’s T2 chart, to PCA and PLS-based monitoring statistics, that are able to deal with the variables colinearity [1]. More recently, a procedure based on dynamic PCA and Missing Data imputation methods was also proposed, that handles both variables cross- and autocorrelation [2]. Multivariate CUSUM charts and multivariate EWMA control have also been proposed to detect finer drifts from normal operation conditions. Even though these control charts may detect some changes on the correlation structure, the majority is designed to monitor the process mean [3, 4].

Among the procedures developed for explicitly monitoring the process covariance, the most widely adopted ones are based on the generalised variance, for which several approaches were proposed, namely by Alt [5], Aparisi [6] and Djauhari [7]. However, the generalised variance is a rather ambiguous measure of multivariate variability, as quite different covariance matrices can lead to similar values for the determinant. As an alternative to the generalised variance, Guerrero-Cusumano [8] proposed the conditional entropy and Djauhari et al. [9] the vector variance. Other approaches are based on the likelihood ratio test, as for example those found in the works of Alt and Smith [10] and Levinson et al. [11].

Analysing all these contributions, it is possible to verify that all of them are strictly based on the data marginal covariance matrix and do not consider the structural relationships between the variables. Therefore, these monitoring procedures are unable to effectively detect and discern changes in the local causal correlation structure, since any change in the marginal covariance between two variables may be due to changes directly related to them, or that happened in any other variables whose variation may, directly or indirectly, affect them. In order to access and use local information on the correlation structure of variables, alternative measures of variation must be adopted in the process monitoring procedures. Partial correlation is one such quantity, as it evaluates the covariance between pairs of variables, after controlling for the effect of others. Consequently, partial correlation coefficients can provide a finer map of the casual correlation structure of variables and, therefore statistical process monitoring based on them should be able to effectively detect changes in the local structure of variables (fault detection) and to better identify the root causes of specific process upsets (fault diagnosis).

To explore these characteristics, in this work, we propose several monitoring statistics for detecting changes in the process structure, based on partial correlation information. Furthermore, several sensitivity enhancing transformations were considered in order to improve the methods performance. These sensitivity enhancing transformations are a major factor in the monitoring statistics performance, since the detection of changes on correlation coefficients is highly dependent on their values during normal operation. For instance, when the intrinsic relation between two highly correlated variables suffer a small deviation, their correlation coefficient remains almost unchanged. On the other hand, whenever two non-related variables become associated, their correlation changes abruptly. This feature suggests that in order to detect small changes on the structure, it is preferable to use uncorrelated variables. A similar principle was already applied by Hawkins [12] with the so called regression-adjusted variables. Yet, in this work, variables transformation is intended to eliminate the contribution of preceding variables in order to isolate drifts from the mean. The use of uncorrelated variables has also been applied to monitor the marginal covariance. However, these procedures only make use of such transformations for algorithmic simplification purposes. Moreover, these procedures tend to use the inverse of the covariance matrix, which may be ill-conditioned, or a triangularization method, like the Cholesky decomposition . This latter approach may in some situations lead to reasonable results. However, as it is based on a series of successive linear regressions where the i-th variable is regressed onto the remaining (i – 1) preceding variables, the sequence by which variables are included in the model highly affects the monitoring statistics performance, since not all variables orders will provide an adequate description of the system structure. Even when a meaningful order is considered, as this transformation regresses each variable onto all its predecessors, it may end up relating variables that were not originally associated. To deal with these issues an alternative transformation is proposed. The goal of the new transformation is to break the relevant variables relationships upon application of linear regression only on the variables that are indeed related. For such, the relevant edges between variables must be first identified, either by using a priori knowledge of the process or through a network reconstruction technique, as for instance the ones based on partial correlations. Each variable is then regressed onto its network parents, resulting in a final regression model where only the directly connected variables are considered to obtain the new set of residual variables. On dynamic systems, time-shifted variables should also be included on the regression model and for non-linear systems, polynomial terms must be added. As an additional step, a Cholesky decomposition can be applied to the residuals, in order to ensure that uncorrelated variables are obtained and also to accommodate any missed relationships.

In order to assess the performance of the various proposed statistics, they were applied to systems with different degrees of complexity, including linear, dynamic and non-linear systems and compared with the current statistics for monitoring process dispersion. In general, the proposed statistics based on partial correlations present higher detection sensitivities for the same false alarm rate, especially when only few variables are affected by the fault, making more difficult their detection. We have also demonstrated that the use of variable transformations that take into account the process’s structure improve the fault detection capability and present a consistent performance throughout all the studied systems.

From this study, a family of monitoring statistics based on the partial correlation coefficients turned out to be suitable to detect changes in the process’ structure even when they are estimated with relatively few observations. Moreover, it was observed that the monitoring of the transformed variables’ variance can also detect structural changes. This ability is a result of model mismatches that take place when a structural change occurs in the process, meaning that the NOC model used in the transformation is no longer capable to correctly predict the variables’ values. Therefore, these two monitoring statistics present a complementary behaviour and, when used together, they are able to detect a full range of faults specifically related with structural changes.

References

1.         Jackson, J.E., Quality Control Methods for Several Related Variables. Technometrics, 1959. 1(4): p. 359-377.

2.         Rato, T.J. and M.S. Reis, Fault detection in the Tennessee Eastman benchmark process using dynamic principal components analysis based on decorrelated residuals (DPCA-DR). Chemometrics and Intelligent Laboratory Systems, 2013. 125(0): p. 101-108.

3.         Abbasi, B., et al., A transformation-based multivariate chart to monitor process dispersion. The International Journal of Advanced Manufacturing Technology, 2009. 44(7): p. 748-756.

4.         Yen, C.-L., J.-J.H. Shiau, and A.B. Yeh, Effective Control Charts for Monitoring Multivariate Process Dispersion. Quality and Reliability Engineering International, 2012. 28(4): p. 409-426.

5.         Alt, F.B., Multivariate Quality Control, in Encyclopedia of Statistical Sciences, S. Kotz, Editor 2005, John Wiley & Sons, Inc. p. 5312-5323.

6.         Aparisi, F., J. Jabaloyes, and A. Carrión, Generalized Variance Chart Design With Adaptive Sample Sizes. The Bivariate Case. Communications in Statistics - Simulation and Computation, 2001. 30(4): p. 931–948.

7.         Djauhari, M.A., Improved Monitoring of Multivariate Process Variability. Journal of Quality Technology, 2005. 37(1): p. 32-39.

8.         Guerrero-Cusumano, J.-L., Testing variability in multivariate quality control: A conditional entropy measure approach. Information Sciences, 1995. 86(1–3): p. 179-202.

9.         Djauhari, M.A., M. Mashuri, and D.E. Herwindiati, Multivariate Process Variability Monitoring. Communications in Statistics - Theory and Methods, 2008. 37(11): p. 1742-1754.

10.       Alt, F.B. and N.D. Smith, Multivariate process control, in Handbook of Statistics, P.R. Krishnaiah and C.R. Rao, Editors. 1988, Elsevier. p. 333-351.

11.       Levinson, W.A., D.S. Holmes, and A.E. Mergen, Variation Charts for Multivariate Processes. Quality Engineering, 2002. 14(4): p. 539–545.

12.       Hawkins, D.M., Regression Adjustment for Variables in Multivariate Quality Control. Journal of Quality Technology, 1993. 25(3): p. 170-182.