(635c) Regularized Bayesian Fusion for Toxin Concentration Estimation in an Industrial Wastewater Treatment Plant | AIChE

(635c) Regularized Bayesian Fusion for Toxin Concentration Estimation in an Industrial Wastewater Treatment Plant

Authors 

Wang, Z. - Presenter, The Dow Chemical Company
Strelet, E., University of Coimbra
Peng, Y., The Dow Chemical Co
Castillo, I., The Dow Chemical Company
Rendall, R., University of Coimbra
Braun, B., The Dow Chemical Company
Joswiak, M., University of California-Santa Barbara
Chiang, L., Dow Inc.
Reis, M., University of Coimbra
The complex nature of biological and chemical processes taking place in a Waste Water Treatment Plant (WWTP) raises significant challenges to its management and optimal operation. In this work, we address the problem of estimating the concentration of a toxin present in an industrial effluent in order to timely take the necessary actions that secure it never exceeds the regulatory limit. However, the toxin concentration is measured at a low sampling rate (2-3 times per week), making its monitoring and control harder. To circumvent this limitation, models can be developed to estimate the concentration level more frequently. The biological diversity typically found in WWTP and the associated complex metabolic phenomena often make first principles modeling approaches either unfeasible or unreliable. Data-driven approaches provide an alternative solution, but laboratory measurements, as well as other off line instrumentation (such as imaging devices) are not sampled frequently, and only online sensor data from different points in the process are collected at higher rates. In order to make maximum use of the available data, a fusion methodology was developed that is able to handle simultaneously the multirate, asynchronous and heterogeneous (in data structure and quality/uncertainty) nature of the data collected from the WWTP. Data/information fusion methods offer flexible solutions to handle complex data [1], [2] but their application to process industries is still underexplored. The proposed fusion solution was developed under the InfoQ concept, whose purpose is to maximize the quality of information generated in an empirical study [3].

The proposed fusion scheme considers several single-source models (one source regards a particular origin of information in the process, usually a unit or an analytical device) that are flexibly combined and used depending on the availability of information. Their quality is also taken into account, and the smoothness of the successive estimates of the toxin level is controlled using a Bayesian regularization approach. Several regression methods from different corners of the data analytics landscape were considered for building the single-source models (namely, penalized, latent variable and tree-based ensemble regression methods). The models were developed and tuned using a nested double cross-validation strategy [5], [6] and the repeated prequential method [4] in order to handle the time series nature of data.

Our Regularized Bayesian Fusion strategy led to more frequent access to toxin concentration, based on the most updated information available, properly fused through Bayesian fusion, penalyzing excessive variation in the estimates due to unreliable/noisy estimates when less information is available. This methodology should facilitate the management and operation of WWTP and the ability to maintain the toxin concentration below the compliance level.

References

[1] F. Castanedo, «A Review of Data Fusion Techniques», The Scientific World Journal, vol. 2013, pp. 1–19, 2013, doi: 10/gb7x39.

[2] A. Diez-Olivan, J. Del Ser, D. Galar, e B. Sierra, «Data fusion and machine learning for industrial prognosis: Trends and perspectives towards Industry 4.0», Information Fusion, vol. 50, pp. 92–111, 2019, doi: 10/gf6wkf.

[3] M. S. Reis e R. Kenett, «Assessing the value of information of data-centric activities in the chemical processing industry 4.0», AIChE J, vol. 64, n. 11, pp. 3868–3881, 2018, doi: 10/gff327.

[4] V. Cerqueira, L. Torgo, e I. Mozetic, «Evaluating time series forecasting models: An empirical study on performance estimation methods», arXiv:1905.11744 [cs, stat], 2019, Acedido: Mai. 06, 2020. [Em linha]. Disponível em: http://arxiv.org/abs/1905.11744.

[5] R. Rendall e M. S. Reis, «Which regression method to use? Making informed decisions in “data-rich/knowledge poor” scenarios – The Predictive Analytics Comparison framework (PAC)», Chemometrics and Intelligent Laboratory Systems, vol. 181, pp. 52–63, 2018, doi: 10/gfc6c8.

[6] T. J. Rato e M. S. Reis, «SS-DAC: A systematic framework for selecting the best modeling approach and pre-processing for spectroscopic data», Computers & Chemical Engineering, vol. 128, pp. 437–449, 2019, doi: 10/ghj9ks.