2023 Spring Meeting and 19th Global Congress on Process Safety

(87c) Text Data Feature Extraction Via NLP Embeddings Methods: Robustness and Power Assessment

Checkout You must be logged in to view this content. Log in now.

Pricing

Individuals

List Price	225.00
AIChE Pro Members	150.00
AIChE Emeritus Members	105.00
Employees of CCPS Member Companies	150.00
AIChE Graduate Student Members	Free
AIChE Undergraduate Student Members	Free

Authors

Castillo, I. - Presenter, Dow Inc.

Strelet, E., University of Coimbra

Wang, Z., Dow Inc.

Peng, Y., The Dow Chemical Co

Rendall, R., University of Coimbra

Chin, S. T., The Dow Chemical Company

Reis, M., University of Coimbra

A large variety of sensors and measurement instruments are available nowadays in Chemical Processing Industries (CPIs). Using this wide spectrum of sensor technology, it is possible to measure or to infer crucial process parameters for monitoring and control purposes [1]â[3]. However, the coverage of the relevant process information is still limited. Even with the existing variety of instrumentation available, the coverage of sensing instruments is physically constrained to a sample or a given section / area of the process / reduced set of physical quantities. Also, the pre-existent instrumentation, sometimes is not enough to measure or estimate new parameters of interest or to detect some abnormal phenomena. For example, existent leaks, corrosion, insulation degradation, unplanned events, etc., are not usually possible to measure with existing sensor technology.

Even though the measurement instrumentation diversity is increasing, the sensors are not the only data sources existing in the CPIs databases. The text data provided from reports, alarms, process tags, etc. are potential interesting and diverse sources of information. These data can contain relevant aspects that sensors are not able to capture. Proper handling of process text data can therefore bring more information for process diagnosis, monitoring and control.

With the recent advances in Natural Language Processing (NLP) [4]; new methods are available that allow to extract features from text data beyond simple frequency counting. The semantics, i.e., the meaning of the text can also be codified in a structured numerical feature, which can be used for process analysis. However, the understanding of a given NLP model is still quite complex, and they are essentially used as black-boxes. Additionally, the power and robustness of this kind of models is still not explored in the CPI context. Therefore, we explore several NLP models for text embedding task, in the scope of a real process, in order to perform an exploratory analysis of the information content and potential associated value for process tuning [5]. Dimension reduction [6] and clustering [7] methods were used to assess the methods and derive several robustness and power metrics.

References

[1] C. H. Goh, Â«Representing and reasoning about semantic conflicts in heterogeneous information systemsÂ», Thesis, Massachusetts Institute of Technology, 1997. Acedido: 23 de outubro de 2019. [Em linha]. DisponÃvel em: https://dspace.mit.edu/handle/1721.1/10713

[2] V. Sheokand e V. Singh, Â«Modeling Data Heterogeneity Using Big DataSpace ArchitectureÂ», em Advanced Computing and Communication Technologies, vol. 452, R. K. Choudhary, J. K. Mandal, N. Auluck, e H. A. Nagarajaram, Eds. Singapore: Springer Singapore, 2016, pp. 259â268.

[3] M. S. Reis, R. D. Braatz, e L. H. Chiang, Â«Big Data - Challenges and Future Research DirectionsÂ», Chemical Engineering Progress, n.^o Special Issue on Big Data(March), pp. 46â50, 2016.

[4] D. Antons, E. GrÃ¼nwald, P. Cichy, T. O. Salge, e T. O. Salge, Â«The application of text mining methods in innovation research: current state, evolution patterns, and development prioritiesÂ», R & D Management, vol. 50, n.^o 3, pp. 329â351, jun. 2020, doi: 10.1111/radm.12408.

[5] K. Lu, A. Grover, P. Abbeel, e I. Mordatch, Â«Pretrained Transformers as Universal Computation EnginesÂ». arXiv, 30 de junho de 2021. Acedido: 1 de setembro de 2022. [Em linha]. DisponÃvel em: http://arxiv.org/abs/2103.05247

[6] L. McInnes, J. Healy, e J. Melville, Â«UMAP: Uniform Manifold Approximation and Projection for Dimension ReductionÂ», arXiv:1802.03426 [cs, stat], 2018, Acedido: 12 de outubro de 2020. [Em linha]. DisponÃvel em: http://arxiv.org/abs/1802.03426

[7] L. McInnes e J. Healy, Â«Accelerated Hierarchical Density ClusteringÂ», em 2017 IEEE International Conference on Data Mining Workshops (ICDMW), nov. 2017, pp. 33â42. doi: 10.1109/ICDMW.2017.12.

Breadcrumb

2023 Spring Meeting and 19th Global Congress on Process Safety

(87c) Text Data Feature Extraction Via NLP Embeddings Methods: Robustness and Power Assessment

Authors