(305c) Extracting Meaningful Features from Industrial Text Data | AIChE

(305c) Extracting Meaningful Features from Industrial Text Data

Authors 

Castillo, I. - Presenter, The Dow Chemical Company
Strelet, E., University of Coimbra
Peng, Y., The Dow Chemical Co
Rendall, R., University of Coimbra
Chin, S. T., The Dow Chemical Company
Reis, M., University of Coimbra
In the Chemical Processing Industry (CPI), the available instrumentation may not capture all necessary information about the process, such as information regarding process health like leaks, corrosion, insulation degradation, and unplanned events. However, text data derived from reports, alarms, process tags, etc. can serve as diverse and informative sources of information for process analysis and monitoring. Appropriate handling of such data can provide supplementary insights for process diagnosis, monitoring, and control.

Recent advancements in Natural Language Processing (NLP) [1] have enabled the extraction of features from text data beyond the frequency counting of Bag of Words (BoW) [2] kind of approaches. NLP models can codify the meaning of text into numerical features, which can be used for further analysis. However, NLP models remain complex to understand and are still primarily used as black-box models. Moreover, the power and robustness of text feature extraction methods is still not explored in the CPI context. Therefore, we evaluated several text feature extraction methods, including Bag of Words (BoW) and NLP, using both unsupervised and supervised approaches [3] to assess their power and robustness.

We applied text data exploratory analysis to a real case study from Dow Chemical Company site to assess the information that can be extracted from industrial text data to predict the probability of an event occurrence. Our findings show that the context described in text data is relatively sparse, which may be related to the functional aggregation level reported in the texts. Overall, our study demonstrates the potential for text data to be used in process analysis and monitoring in CPI.

References

[1] D. Antons, E. Grünwald, P. Cichy, T. O. Salge, e T. O. Salge, «The application of text mining methods in innovation research: current state, evolution patterns, and development priorities», R & D Management, vol. 50, n.o 3, pp. 329–351, jun. 2020, doi: 10.1111/radm.12408.

[2] A. Zheng e A. Casari, Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists, 1 edition. Beijing : Boston: O’Reilly Media, 2018.

[3] T. Hastie, R. Tibshirani, e J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, 2nd edition. New York, NY: Springer, 2009.