(53a) Theory-guided Data Science: A New Paradigm for Scientific Discovery in the Era of Big Data
AIChE Annual Meeting
2017
2017 Annual Meeting
Sustainable Engineering Forum
Big Data and Sustainability
Monday, October 30, 2017 - 8:00am to 8:26am
The potential of data science methods, that have found tremendous success in the commercial arena, is increasingly being recognized for advancing scientific discovery, especially in applications that are critically linked to sustainability. However, the "black-box" application of data science methods in oblivion to the domain context and knowledge has met with limited success in scientific domains, where complex physical phenomena are insufficiently represented using scarcely available data samples. This talk will introduce a novel paradigm for scientific discovery, termed as theory-guided data science, that uses the unique capability of data science methods to automatically learn patterns and models from big data, but without ignoring the treasure of accumulated scientific knowledge. An overarching goal of theory-guided data science is to learn data science models that not only make accurate predictions on the available labeled data, but are also interpretable and consistent with scientific theories and principles, thus translating to scientific advancements. This talk will describe several strategies for integrating scientific knowledge in conventional data science frameworks, using illustrative examples of applications from a diverse range of scientific domains such as material science, hydrology, turbulence modeling, bio-medical engineering, and neuroscience. The talk will conclude with a detailed case study on mapping the dynamics of freshwater bodies at a global scale using data from Earth observing satellites.