(28x) Miner: An Ontology-Based Approach for Advancing Toxicological and Public Health Sciences
AIChE Annual Meeting
2023
2023 AIChE Annual Meeting
Food, Pharmaceutical & Bioengineering Division
Poster session: Engineering Fundamentals in Life Science
Monday, November 6, 2023 - 3:30pm to 5:00pm
What makes MINER unique is its ability to directly link search databases such as Google Scholar, Elsevier, and Scopus to a local database containing PDF files by automatically downloading them and apply NLP and transformer models, after having converted the unstructured data into structured ones. However, this process relies on the user input of appropriate keywords for successful retrieval of relevant files. Case-specific dictionaries are then constructed based on the keywords used. The corpora serve as the foundation for training the model each time. A model can be trained in unsupervised mode to detect patterns in structured data or in self-supervised mode using labels from the provided corpus. In addition to the model described, an unsupervised learning BERTopic model (Grootendorst, 2022) is trained to classify the collected documents as well. Irrelevant documents are removed from the local repository. Automated transformer algorithms, including filtration and clustering, extract the data of interest from relevant documents, by applying the core model. MINER has been applied in four different case studies, each with unique requirements.
To aid in the epidemiological surveillance during the COVID-19 pandemic in Greece, the national health organization issued a daily report disclosing the epidemiological data. Simultaneously, a pioneering epidemiological model was developed in order to predict the dynamics of COVID 19 pandemic (Sarigiannis et al., 2021). By using MINER, the unstructured data provided by the Public Health authorities, were fed into this mathematical model, providing the opportunity for more accurate predictions. The process was automated and repeated daily. Today, the National Health Organization publishes a weekly surveillance report for all emerging respiratory viruses. Then the report is subjected to a refinement process to extract the pertinent information, which is subsequently integrated into the available modelling tools.
SARS-CoV-2, in addition to exhibiting high transmissibility, also demonstrates the capacity to mutate, much like many other organisms. In practical terms, this implies that the existing vaccines may lose part of their efficacy in protecting against transmission and severe disease, as well as against re-contamination of individuals who had already recovered. The integration of rich, comprehensive, and the most updated information into the mathematical model would result in more accurate and reliable predictions. In order to conduct literature research related to these matters, a relevant corpus is constructed, and MINER functions are applied. The core model of the MINER software was then utilized to detect relevant information regarding the transmission rate of SARS-COV2 in comparison to other strains, as well as the percentage of protection loss of available vaccines. Indeed, the integration of relevant and up-to-date information into the mathematical model can significantly enhance the accuracy of the model. In the case of a rapidly evolving situation like the COVID-19 pandemic, it is crucial to have access to the most current data and to update the model accordingly. This can help decision-makers to make more informed and timely decisions in the face of a constantly changing landscape. The ability to quickly and accurately process large amounts of data, as provided by tools like MINER, can be a valuable asset in the fight against the pandemic and in other areas of research and decision-making as well.
The second case study where MINER routines were used, concerns the development of adverse outcome pathways (AOPs) which often involves the up or down regulation of a gene function, or the identification of a metabolite/biomarker that has adverse consequences for the human health. The application of MINER in this case is to correlate or identify the fundamental events that disrupt smooth androgenic behaviour. Specifically, MINER is used to associate anti-androgen effects with the abnormal function of genes. As an empowering step, the number of publications related to an adverse event is used as evidence, and this evidence becomes stronger as the number of publications increases. Regarding this case study, MINER was able to collect 453 publications, in which it identified 112 biological entities that may be associated with male infertility. The filtration algorithm applied by MINER resulted in the identification of 38 genes associated with these adverse effects. This application is promising as it can bridge the gap between the medical community and literature in fields such as toxicology and public health where the term adverse outcome pathway (AOP) may not be commonly used. Additionally, it can bring together all these events under a common umbrella, leading to a more sustainable future, and contributing to applications like the OECD AOPwiki (https://aopwiki.org/).
Finally, MINER is used to develop systems biology models. Taking as input data metabolic pathways in KGML format (Kanehisa et al., 2017), it provides the possibility to convert them into systems of differential equations. This technique allows for the development of Next Generation Systems Biology (NGSB) models, which include all the nodes of the provided metabolic network, forming systems of hundreds of equations. The application fields for NGSB models are vast and include the construction of systems biology-based AOPs and quantitative AOPS (qAOPs), as well as the quantitative determination of the impact that a disruptor may have on the human health, providing to decision makers a valuable tool for the evaluation of a possible disruptor.
MINER is a versatile computational tool with numerous applications. It offers solutions to the parameterization of mathematical models and serves as a tool for developing AOPs. It also allows users to design their desired information network through a variety of options, including connection to Cytoscape software (Kohl et al., 2011) through RCy3 library (Gustavsen et al., 2019). The next routines to be included in MINER concern the identification of kinetic parameters for PBPK and Systems Biology models. Additionally, a training network based on reinforcement learning is currently in development; it will integrate data from relevant databases leading to well-endowed systems biology and biokinetic models. MINER will soon be available online.
Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794. https://doi.org/10.48550/arXiv.2203.05794
Gustavsen, J. A., Pai, S., Isserlin, R., Demchak, B., & Pico, A. R. (2019). RCy3: Network biology using Cytoscape from within R. bioRxiv, 793166. https://doi.org/10.1101/793166
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., & Morishima, K. (2017). KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res, 45(D1), D353-D361. https://doi.org/10.1093/nar/gkw1092
Kohl, M., Wiese, S., & Warscheid, B. (2011). Cytoscape: software for visualization and analysis of biological networks. Data mining in proteomics: from standards to applications, 291-303. https://doi.org/10.1007/978-1-60761-987-1_18
Sarigiannis, D., Petridis, I., Karakoltzidis, A., & Karakitsios, S. (2021). Multimodal Integrated Modelling for COVID-19 Health Risk Management. 2021 AIChE Annual Meeting,