(420f) Benchmarking Model Selection Criteria within an Automated Kinetic Rate Equation Discovery Framework
AIChE Annual Meeting
2022
2022 Annual Meeting
Topical Conference: Next-Gen Manufacturing
Modeling, Optimization, and Control in Next-Gen Manufacturing II
Tuesday, November 15, 2022 - 5:35pm to 6:00pm
Different modelling techniques have been proposed and explored in the literature: white-box modelling (i.e.: first-principles models), grey-box modelling (i.e.: hybrid models), and black-box modelling (i.e.: data-driven models). The grey-box modelling technique aims to exploit the advantages of white-box modelling, namely its predictive ability, whilst also exploiting the advantages of black-box modelling, namely its ease of construction. However, most hybrid models presented in the literature make subjective and undetermined assumptions about the chemical system investigated (e.g.: assuming kinetic formalisms) and do not include a rigorous model selection method. While these assumptions hinder the accuracy and predictability of the model proposed, the absence of a rigorous model selection method limits the capabilities of finding the underlying ground truth of the chemical system.
Furthermore, automated kinetic rate equation discovery deals with complex nonlinear optimization problems arising from model parameter estimation. Two approaches can be used to solve the parameter estimation problem within the model building of dynamic systems framework: sequential and simultaneous. The sequential approach integrates a differential model at each iteration to judge its performance. The simultaneous approach reformulates the parameter estimation problem, usually via approximations introduced by different orthogonal collocation methods. Both approaches suffer from noteworthy shortcomings. The sequential approach suffers from high computational times caused by the integration step, whilst the simultaneous approach suffers from sub-optimal solutions caused by approximations and complex reformulations.
Due to the discussed limitations, both within the model building and parameter estimation framework, we have developed a method that tackles these issues. Our method uses minimal â but important and physically-driven â prior knowledge to guide a symbolic regression algorithm to propose kinetic rate equations for a given chemical system. Using carefully analyzed model selection criteria and model-based design of experiments, we can robustly identify and choose the model that accurately describes the systemâs kinetics while providing limited, but highly informative data. Additionally, because of the methodâs setup, the integration of the differential model is altogether avoided without the need of reformulating the parameter estimation problem.
Particular attention has been given to the analysis of the model selection method, as this is the critical step to retrieve the true modelâs structure of a chemical system. Over the years, many model selection criteria have been proposed. These model selection approaches have been motivated by a wide range of philosophical viewpoints, such as information criteria, Bayesian, information-theoretic or decision-theoretic perspectives. Some works in the literature use a model selection criterion to differentiate between competing models. However, the choice of a particular criterion is seemingly random.
To strengthen our automated kinetic rate model approach, we benchmarked a number of model selection criteria on different case studies, whilst varying the quantity and information content of the data provided. We also assessed the level of noise that each criterion was able to withstand until it started selecting wrong models. Our objective with this study was to discover which criterion, if any, was better suited for the kinetic rate discovery task, as well as investigating which criterion was the most robust. Our study demonstrated that, from the criteria examined, the Hannan and Quinn criterion is the most robust and well-suited model selection criterion for the problem class at hand. All in all, our meticulous choice of model selection criterion integrated within our proposed methodological framework maximizes the probability of the true kinetic rate model being retrieved from the data used, proving the essential role of having a rigorous model selection method.