(127c) Knowledge Discovery and Explanation from Industrial Process Data Using Clustering and Subspace Search | AIChE

(127c) Knowledge Discovery and Explanation from Industrial Process Data Using Clustering and Subspace Search

Authors 

Zhu, W. - Presenter, Chemical Engineering Department, Louisiana State U
Romagnoli, J., Louisiana State University
Data driven methods for process data analysis have received considerable attention in past years due to the accessibility of real time process data. Compared with traditional model based methods, data driven methods was superior in its robustness to different chemical processes. Thus, many studies were developed based on supervised learning and multivariate statistics for process monitoring. Nevertheless, both above approaches required well classified historical data to describe process behaviors. In this regard a technique that is able to extract process behaviors correctly is required before the implementation of any data driven methods. To build up this bridge, we discuss a clustering based approach to extract process behaviors from historical data. Moreover, a novel approach to find the explanation between pair-wise disparate process clusters by subspace search is proposed to reveal more hidden knowledge behind historical data.

In this study, Density-based spatial clustering of applications with noise (DBSCAN) and k-means clustering are introduced for process behavior extraction from historical database. Both clustering techniques are studied on industrial data of a pyrolysis reactor data and simulation data. Their performances are evaluated by three cluster evaluation metrics (homogeneity, completeness and Davies–Bouldin index).

Beyond the process behavior extraction using data clustering techniques, we propose a subspace searching based approach to explain the disparity between pair-wise process clusters in terms of the most contributing attributes. In other words, the most contributing attributes are used to explain the disparity between certain process clusters (comparative group) with its reference clusters (reference group). Each data sample in comparative group is compared with the reference group by its dimensional normalized k-distance in each subspaces (or called ‘dimensions’). The subspace with highest dimensional normalized k-distance is treated as the explanation of the disparity. Nevertheless, the brute force searching is computational infeasible due to its computational complexity. Thus, sample condensation and greedy searching are used to handle the computational complexity in our study.

The results illustrate that both DBSCAN and k-means clustering performs well on classification of process behaviors. Various process modes and process faults are recognized by such clustering techniques. Furthermore, pair-wise explanation of disparate process clusters seems reasonable by reviewing the variation of attributes in the explanatory subspace. The utilization of sample condensation and greedy searching optimizes the computational complexity, which enables such approach both suitable for online fault identification and offline data analysis.

Checkout

This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.

Checkout

Do you already own this?

Pricing

Individuals

AIChE Pro Members $150.00
AIChE Graduate Student Members Free
AIChE Undergraduate Student Members Free
AIChE Explorer Members $225.00
Non-Members $225.00