(533b) Discovering Large Scale Metabolic Kinetic Models through the Use of Explainable AI Incorporating Different Conditions and Multiple Mutant Strains | AIChE

(533b) Discovering Large Scale Metabolic Kinetic Models through the Use of Explainable AI Incorporating Different Conditions and Multiple Mutant Strains

Authors 

Xenios, S. - Presenter, National Technical University of Athens
Kokosis, A., National Technical University of Athens
Mexis, K., National Technical University of Athens
The scope of this work is the implementation of explainable machine learning and rule-extraction techniques for the extraction of valuable knowledge that helps develop meaningful large scale metabolic models faster and more systematically. Among the different methods discussed, Monte Carlo kinetic models of metabolism stand out as potentially tractable methods to model genome scale networks while also addressing in vivo parameter uncertainty. Leveraging ML techniques we are able to reduce the uncertainty of the kinetic parameters and define sampling space for the Monte Carlo methods, that will result in feasible and robust large-scale kinetic models.When trying to parametrize a large biochemical network consisting of a couple of hundreds of metabolites and reactions it can prove to be quite challenging due to large uncertainty of the kinetic parameters and the lack of experimental data (available kinetic parameters for only a very limited number of enzymatic reactions). To systematically develop such large-scale kinetic models we will make use of the well established ORACLE framework (EPFL group). At its core the ORACLE framework is a sampling based method with a lot of manual work to start generating relevant kinetic models and if the conditions (bioreactor type, different initial conditions) then the models might have great deviations from the real system. Giving its sampling based nature, ORACLE generates a population of kinetic models, rather than a single solution. To this day large scale kinetic models remain a very useful tool to make suggestions on potential genetic modifications for higher product titres or yields but due to the complexity and lack of interpolarity are not as attractive as other frameworks (dynamic FBA).

Populations of kinetic models using the ORACLE framework were generated and through the use of advanced analytics, machine learning and explainable machine learning techniques we reduced the uncertainty in parameter estimation, accelerating the generation of feasible kinetic models, while making use of a plethora of available omics data such as fluxomics, metabolomics and chemostat fermentations. An essential step was defining Key Performance Indicators (KPIs) that serves as benchmarks for evaluating the effectiveness and accuracy of the developed kinetic models, such as model stability, agreement with experimental data, prediction accuracy, consistency with thermodynamic laws. For the machine learning algorithms we divided the generated kinetic models into classes based on the KPIsagreement with experimental data. A machine learning model selection and evaluation pipeline, leveraging Ddifferent classification algorithms and nested cross validation, was implemented on the generated population of kinetic models dataset and the best classifiermodel was used for rule extraction. Different rule extraction algorithms were examined used to generate new rules which will be imposed on the next generation of kinetic models.

This framework resulted in the increase of the throughput of potential large scale kinetic models that describe reality while also encompassing knowledge from legacy experimental data. We implemented the pipeline on two different strains and types of data: Saccharomyces cerevisiae cultivations on different oxygen levels and E.coli cultivations with different genetic modifications imposed on them.