(224f) Towards the Use of Multi-Relational Data Mining (MRDM) in Bioprocess Models
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Topical Conference: Next-Gen Manufacturing
Big Data and Applications in Advanced Modeling and Manufacturing
Tuesday, November 17, 2020 - 9:15am to 9:30am
So far, a significant investment has been made in developing mechanistic models for bioprocesses, both by academia and industry. However, little thought has been put into how to store and keep on developing and exploiting these models. Hence, even if they contain vast and comprehensive knowledge about the process, this information might be misplaced or even lost.
A potential solution is the use of relational databases. They store available information by applying logical connections and consequently allow the creation of rationally interconnected repositories. Furthermore, this structure enables the use of multi-relational data mining (MRDM) as a tool to find inter-table patterns2. MRDM has been successfully applied to other disciplines, such as in bioinformatics3, once it provides a basis for finding new premises and relationships within the information discriminated and disseminated across the multiple tables. Therefore, in order to get a step closer to smart biomanufacturing, this study has a two-fold objective: (i) to design and implement a relational database which aims at collecting mechanistic models for bioprocesses, and related process information; as well as (ii) to explore the potential behind MRDM with regards to biomanufacturing.
The relational database, developed in this work, collects knowledge in several interconnected tables. In these tables, the following data is recorded: (I) process conditions (e.g. pH, temperature, mode of operation, etc.); (II) model details (type and characteristics); (III) model parameters; and, (IV) uncertainty ranges associated with the parameters (when available)4. Additionally, the database contains a table of functional problems, based on the operational conditions (e.g. the cooling system is not working), and a table with the corresponding feasible solutions (e.g. adjust the cooling water flow). Subsequently, an inductive logic programming and relational data mining algorithm5 are implemented and applied to the database. This allows, for example, finding connections between new models and the library of operational problems simply based on the implicit characteristics of the model. In order to highlight the applicability and usefulness of the proposed strategy, it is demonstrated on an industrial anaerobic microbial conversion case study.
(1) Narayanan, H.; Luna, M. F.; von Stosch, M.; Cruz Bournazou, M. N.; Polotti, G.; Morbidelli, M.; Butté, A.; Sokolov, M. Bioprocessing in the Digital Age: The Role of Process Models. Biotechnol. J. 2019, 1900172, 1â10. https://doi.org/10.1002/biot.201900172.
(2) Ferreira, C. A.; João, G.; Victor, S. C. Sequential Pattern Mining in Multi-Relational Datasets. In Current Topics in Artificial Intelligence; Meseguer, P., Mandow, L., M. Gasca, R., Eds.; 2009; pp 121â130.
(3) Page, D.; Craven, M. Biological Applications of Multi-Relational Data Mining. ACM SIGKDD Explor. Newsl. 2003, 5 (1), 69. https://doi.org/10.1145/959242.959250.
(4) Caño de las Heras, S.; Krühne, U.; Mansouri, S. S. Relational Database for the Description of Fermentation inside a Simulation Software. In ECAB 5 The 5th EUROPEAN CONGRESS OF APPLIED BIOTECHNOLOGY; Florence, 2019; pp 1328â1329.
(5) Srinivasan, A. Aleph Manual http://www.cs.ox.ac.uk/activities/programinduction/Aleph/aleph.html#SEC4 (accessed Mar 6, 2020).