(126c) Extracting Causal Relations from Incident Reports: A Natural Language Processing and Topic Modeling Approach
AIChE Spring Meeting and Global Congress on Process Safety
2020
2020 Virtual Spring Meeting and 16th GCPS
Industry 4.0 Topical Conference
Big Data Analytics and Smart Manufacturing I
Thursday, August 20, 2020 - 2:10pm to 2:30pm
Guanyang Liu, Mason Boyd, and Noor Quddus
Mary Kay OâConnor Process Safety Center, Texas A&M University System, College Station, TX 77840, USA
Abstract
Lessons learned from past incidents are essential to enhancing process safety of chemical industry and should be considered as knowledge legacy that evolves over time for corporate and government. Although a wealth of empirical knowledge has been accumulated from public incident databases and incident investigation reports, learnings are still limited due to the high expense of manual content analysis and lack of methodology to gain insights from past incidents.
Recently there are a few attempts that develop methods to enable automated content analysis of incident reports by natural language processing (NLP) techniques, but with a manual list of key words still needed, the methods are not intelligent or automated enough to extract information that is outside the pre-defined vocabulary. In this work, topic modeling, an advanced NLP technique for text mining, is employed to identify causal relations from incident reports based on unsupervised learning algorithms. The topic model is capable of generating an exhaustive list of the incident causes described in the reports and indicating the potential of identifying root causes with more comprehensive training text data applied in the future work.
Keywords: incident data, natural language processing, topic modeling