(515c) Optimization-Based Approaches for Explainable, Automated Chemometric Models | AIChE

(515c) Optimization-Based Approaches for Explainable, Automated Chemometric Models

Chemometric models are widely used in the pharmaceutical industry in process analytical technology (PAT) applications. Most of these models are typically built solely on an accuracy objective [1-3], thus not explicitly accounting for model robustness. We refer to robustness in the context of known variability in the training data, e.g., [4-6], rather than robustness with respect to outlier detection [7-8] or downtime in a continuous manufacturing setting [9]. This work proposes an optimization-based approach that incorporates both accuracy and robustness metrics into an optimization problem. We consider accuracy as the goodness of the prediction and propose a moment-matching method that is commonly used in statistics to account for model robustness. Our method includes decisions variables for both the pre-processing and the regression model-building steps. Single and multi-objective formulations, as well as decomposition strategies for the optimization problem are examined. We apply and evaluate our method for industrially relevant case studies from spectroscopy using ENTMOOT [10]. The results illustrate the potential of our approach in automating the decision-making process for building and updating chemometric models, and hence in significantly reducing cost of goods for PAT applications.

References:

[1] Dyrby, M., Engelsen, S. B., Nørgaard, L., Bruhn, M., & Lundsberg-Nielsen, L. (2002). Chemometric quantitation of the active substance (containing C≡ N) in a pharmaceutical tablet using near-infrared (NIR) transmittance and NIR FT-Raman spectra.Applied Spectroscopy, 56(5), 579-585.

[2] Zhang, L., & García-Muñoz, S. (2009). A comparison of different methods to estimate prediction uncertainty using Partial Least Squares (PLS): a practitioner's perspective.Chemometrics and Intelligent Laboratory Systems, 97(2), 152-158.

[3] Bocklitz, T., Walter, A., Hartmann, K., Rösch, P., & Popp, J. (2011). How to pre-process Raman spectra for reliable and stable models?Analytica Chimica Acta, 704(1-2), 47-56.

[4] Igne, B., Shi, Z., Drennen III, J. K., & Anderson, C. A. (2014). Effects and detection of raw material variability on the performance of near-infrared calibration models for pharmaceutical products.Journal of Pharmaceutical Sciences, 103(2), 545-556.

[5] Hetrick, E. M., Shi, Z., Barnes, L. E., Garrett, A. W., Rupard, R. G., Kramer, T. T., Cooper, T. M., Myers, D. P., & Castle, B. C. (2017). Development of near infrared spectroscopy-based process monitoring methodology for pharmaceutical continuous manufacturing using an offline calibration approach.Analytical Chemistry, 89(17), 9175-9183.

[6] García-Muñoz, S., & Torres, E. H. (2020). Supervised extended iterative optimization technology for estimation of powder compositions in pharmaceutical applications: method and lifecycle management.Industrial & Engineering Chemistry Research, 59(21), 10072-10081.

[7] Hubert, M., & Branden, K. V. (2003). Robust methods for partial least squares regression. Journal of Chemometrics: A Journal of the Chemometrics Society, 17(10), 537-549.

[8] Hubert, M., Rousseeuw, P. J., & Vanden Branden, K. (2005). ROBPCA: a new approach to robust principal component analysis. Technometrics, 47(1), 64-79.

[9] Colón, Y. M., Vargas, J., Sánchez, E., Navarro, G., & Romañach, R. J. (2017). Assessment of robustness for a near-infrared concentration model for real-time release testing in a continuous manufacturing process.Journal of Pharmaceutical Innovation, 12(1), 14-25.

[10] Thebelt, A., Kronqvist, J., Mistry, M., Lee, R. M., Sudermann-Merx, N., & Misener, R. (2021). ENTMOOT: a framework for optimization over ensemble tree models.Computers & Chemical Engineering, 151, 107343.