(635f) Statistical Machine Learning for the DOW Data Challenge Problem

Conference

AIChE Annual Meeting

Year

2021

Proceeding

2021 Annual Meeting

Group

Computing and Systems Technology Division

Session

Data Science/Analytics for Process Applications

Time

Thursday, November 11, 2021 - 4:45pm to 5:00pm

Authors

Qin, S. J. - Presenter, City University of Hong Kong

Guo, S., University of Southern California

Li, Z.

Chiang, L., Dow Inc.

Castillo, I., The Dow Chemical Company

In this paper, we present a statistical machine learning approach to the DOW challenge dataset, which is obtained from a multi-column integrated process, as shown in Figure 1 (Braun et al., 2020). The process is composed of three key distillation columns, Primary Column, Feed Column, and Secondary Column. The main objective of the challenge is to identify key variables that affect impurity levels measured at the primary column outlet from more than 40 process variables. The challenge is to build a high-precision inferential sensor model to predict the impurity. To benchmark various solutions, a validation dataset is provided in addition to the training dataset, which contains data collected over a year of time. The validation dataset should not be used in any way for modeling, or to determine any hyperparameters in the modeling phase. It can only be used for showing the accuracy of the inferential sensor model.

Our proposed solution is a statistical machine learning approach which consists of i) process data exploratory analysis, ii) a method for variable selection, iii) a method to deal with non-negative physical property modeling using a soft-plus function; and iv) a method for real-time bias updating based on known data. We benchmark main algorithms among partial least squares (PLS), lasso, and the least angle regression solution (LARS). We demonstrate using the validation dataset that our method gives superior prediction results. The pros and cons of the statistical machine learning methods are given with practical implications for industrial users. We make use of and emphasize on the importance of domain knowledge in exploratory analysis and feature selections. We report the identification of mode-switching operation in the data that leads to proper data pre-processing and interpolations found in the impurity data. We provide a solution for irregularly sampled quality data modeling, which shows that it is unnecessary to interpolate the lab-test impurity data.

Figure 1. DOW Challenge process flowchart from which the datasets were collected.

References

Braun, I. Castillo, M. Joswiak, Y. Peng, R. Rendell, A. Schmidt, Z. Wang, L. Chiang, and B. Colegrove, Data Science Challenges in Chemical Manufacturing, IFAC World Congress, July 2020, Berlin, Germany.

Topics

Computing and Systems Engineering

Process Automation & Control

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: January 2025

CEP: December 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.