(224b) Recognizing and Avoiding Big Data Analytics Traps in Applications
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Topical Conference: Next-Gen Manufacturing
Big Data and Applications in Advanced Modeling and Manufacturing
Tuesday, November 17, 2020 - 8:15am to 8:30am
Advances in sensors and low-cost storage are leading to large increases in the collection of manufacturing data for use in process modeling, monitoring, and control. At the same time, the number of readily available software tools has grown exponentially, which has led to a large increase in applications of big data analytics and machine learning. While big data analytics can be effective using applied to the right problem in the right way, many of these traps are embedded in widely used software produced by major big data companies.
This talk will discuss some common mistakes made when applying big data analytics, ranging from the simple to the more subtle. The traps include (1) using leave-one-out cross-validation, (2) drawing inferences based on matching statistics, (3) selecting the best model from many models/methods, (4) finding false correlations by analyzing many two-variable combinations, and (5) not paying close enough attention to the intended model use when selecting methods. Specific examples of the traps are provided from a broad spectrum of applications from polymers to health care.