(635f) Statistical Machine Learning for the DOW Data Challenge Problem
AIChE Annual Meeting
2021
2021 Annual Meeting
Computing and Systems Technology Division
Data Science/Analytics for Process Applications
Thursday, November 11, 2021 - 4:45pm to 5:00pm
Our proposed solution is a statistical machine learning approach which consists of i) process data exploratory analysis, ii) a method for variable selection, iii) a method to deal with non-negative physical property modeling using a soft-plus function; and iv) a method for real-time bias updating based on known data. We benchmark main algorithms among partial least squares (PLS), lasso, and the least angle regression solution (LARS). We demonstrate using the validation dataset that our method gives superior prediction results. The pros and cons of the statistical machine learning methods are given with practical implications for industrial users. We make use of and emphasize on the importance of domain knowledge in exploratory analysis and feature selections. We report the identification of mode-switching operation in the data that leads to proper data pre-processing and interpolations found in the impurity data. We provide a solution for irregularly sampled quality data modeling, which shows that it is unnecessary to interpolate the lab-test impurity data.
Figure 1. DOW Challenge process flowchart from which the datasets were collected.
References
- Braun, I. Castillo, M. Joswiak, Y. Peng, R. Rendell, A. Schmidt, Z. Wang, L. Chiang, and B. Colegrove, Data Science Challenges in Chemical Manufacturing, IFAC World Congress, July 2020, Berlin, Germany.