(127c) Knowledge Discovery and Explanation from Industrial Process Data Using Clustering and Subspace Search
AIChE Spring Meeting and Global Congress on Process Safety
2017
2017 Spring Meeting and 13th Global Congress on Process Safety
3rd Big Data Analytics
Big Data Analytics and Smart Manufacturing II
Tuesday, March 28, 2017 - 4:30pm to 5:00pm
In this study, Density-based spatial clustering of applications with noise (DBSCAN) and k-means clustering are introduced for process behavior extraction from historical database. Both clustering techniques are studied on industrial data of a pyrolysis reactor data and simulation data. Their performances are evaluated by three cluster evaluation metrics (homogeneity, completeness and DaviesâBouldin index).
Beyond the process behavior extraction using data clustering techniques, we propose a subspace searching based approach to explain the disparity between pair-wise process clusters in terms of the most contributing attributes. In other words, the most contributing attributes are used to explain the disparity between certain process clusters (comparative group) with its reference clusters (reference group). Each data sample in comparative group is compared with the reference group by its dimensional normalized k-distance in each subspaces (or called âdimensionsâ). The subspace with highest dimensional normalized k-distance is treated as the explanation of the disparity. Nevertheless, the brute force searching is computational infeasible due to its computational complexity. Thus, sample condensation and greedy searching are used to handle the computational complexity in our study.
The results illustrate that both DBSCAN and k-means clustering performs well on classification of process behaviors. Various process modes and process faults are recognized by such clustering techniques. Furthermore, pair-wise explanation of disparate process clusters seems reasonable by reviewing the variation of attributes in the explanatory subspace. The utilization of sample condensation and greedy searching optimizes the computational complexity, which enables such approach both suitable for online fault identification and offline data analysis.
Checkout
This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.
Do you already own this?
Log In for instructions on accessing this content.
Pricing
Individuals
AIChE Pro Members | $150.00 |
AIChE Graduate Student Members | Free |
AIChE Undergraduate Student Members | Free |
AIChE Explorer Members | $225.00 |
Non-Members | $225.00 |