(130c) Opportunities for Text Mining in Service of Chemical Engineering | AIChE

(130c) Opportunities for Text Mining in Service of Chemical Engineering

Authors 

Sankaranarayanan, S. - Presenter, Carnegie Mellon University
Rose, C., Carnegie Mellon University
Shuang, B., Dow
Bury, S., Dow Inc.
As the world has become increasingly digital, vast storehouses of data in digital form have become available both within the public sphere and within companies as proprietary resources. Over the past decades, there has been a growing awareness of the strategic value of such data to companies and to society more broadly in virtually every field - Chemical Engineering is no exception. The social web can be mined to monitor public opinion related to products and policies and how these change over time in response to news events. Documentation about work practices and events within companies can also be mined to identify ways in which company practices affect its outcomes, employee retention, and even safety concerns. Scientific literature can be mined to identify trends in research topics, common or competing findings, and emerging research interests. The news can be monitored to identify connections between environmental policies, company practices, and environmental impact.

Text mining is a field that brings together an understanding of natural language as an unstructured form of data and machine learning as a suite of modeling tools to layer structure over it. Using machine learning paradigms like probabilistic graphical models and deep learning, it is possible to architect modeling tools that are able to extract latent structure found in text. By identifying and extracting the latent structure found in text, it is possible to transform this unstructured data into structured data, at which point it is possible to apply a plethora of modeling tools to identify trends over time and even plausibly causal relations between events and states. A major goal of research in the area of text mining is to achieve robustness in the face of noisy data. With the big data revolution, new storehouses of vast amounts of textual have become available. While this data is far less well-structured than forms of language that were the target of work in natural language processing decades ago, now research on text mining applied to even the noisiest of this data (for example, from sources such as Twitter or Reddit) are commonplace.

This talk will offer a brief overview of state-of-the-art methodologies for applied machine learning, with pointers to resources and further instruction.