(224b) Recognizing and Avoiding Big Data Analytics Traps in Applications | AIChE

(224b) Recognizing and Avoiding Big Data Analytics Traps in Applications

Authors 

Braatz, R. - Presenter, Massachusetts Institute of Technology
Sun, W., MIT
Advances in sensors and low-cost storage are leading to large increases in the collection of manufacturing data for use in process modeling, monitoring, and control. At the same time, the number of readily available software tools has grown exponentially, which has led to a large increase in applications of big data analytics and machine learning. While big data analytics can be effective using applied to the right problem in the right way, many of these traps are embedded in widely used software produced by major big data companies.

This talk will discuss some common mistakes made when applying big data analytics, ranging from the simple to the more subtle. The traps include (1) using leave-one-out cross-validation, (2) drawing inferences based on matching statistics, (3) selecting the best model from many models/methods, (4) finding false correlations by analyzing many two-variable combinations, and (5) not paying close enough attention to the intended model use when selecting methods. Specific examples of the traps are provided from a broad spectrum of applications from polymers to health care.