(172a) AI-Driven Drug Discovery and Manufacturing Using Automated Ontology-Based Information Extraction | AIChE

(172a) AI-Driven Drug Discovery and Manufacturing Using Automated Ontology-Based Information Extraction

Authors 

Venkatasubramanian, V., Columbia University
Viswanath, S., Eli Lilly & Co.
Vaidyaraman, S., Eli Lilly and Company
Balakrishnan, J., Eli Lilly and Company
Dieringer, J., Eli Lilly and Company
The recent COVID-19 pandemic has highlighted the need to discover and manufacture drugs efficiently, quickly, and cost-effectively. This is a major challenge given the information complexity and overload that needs to be managed effectively in the discovery-to-delivery cycle. Computational tools and AI-based methodologies are crucial going forward to address this challenge. Here, we present a framework that could be used to automatically extract information from structured and unstructured documents with minimal supervision. These documents span several sources (briefing documents, research activities, electronic lab notebooks, process simulation documents) and different stages (research activities, experiments, manufacturing, sales, customer reports) of drug development. The AI-driven information extraction framework comprises the following components – ontologies, entity recognition, concept detection, and relation extraction. Ontologies offer an efficient means for organizing information hierarchically in the form of class-subclass relationships that are connected through object and data properties. Ontology-based information extraction is important because they preserve the semantics, incorporate domain knowledge in AI systems, and are flexible.

While there are several ontologies in the chemistry, biochemical, and medical domains, we present a novel and comprehensive ontology that captures different aspects of drug development and manufacturing including materials, materials properties, unit operations, the final drug product, the risks associated at different stages, and so on. This ontology is an extension of the previously reported Purdue Ontology for Pharmaceutical Engineering [1,2], the first comprehensive ontology in this domain. We integrated this ontology with a weak supervision approach for entity classification [3] that uses a BioBERT model [4]. We demonstrate the performance of the developed framework on several information extraction tasks using actual documents for commercial drugs and highlight the potential of such frameworks for accelerated drug discovery and manufacturing in the future.

References:

1. Leaelaf Hailemariam and Venkat Venkatasubramanian. Purdue ontology for pharmaceutical engineering: Part 1. Conceptual framework. Journal of Pharmaceutical Innovation, 5(3):88–99, 2010

2. Leaelaf Hailemariam and Venkat Venkatasubramanian. Purdue ontology for pharmaceutical engineering: Part 2. Applications. Journal of Pharmaceutical Innovation, 5(4):139–146, 2010.

3. Jason A Fries, Ethan Steinberg, Saelig Khattar, Scott L Fleming, Jose Posada, Alison Callahan, and Nigam H Shah. Ontology-driven weak supervision for clinical entity classification in electronic health records. Nature communications, 12(1):1–11, 2021.

4. Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Dong Hyeon Kim, Sunkyu Kim, Chan Ho So, and JaewooKang. Biobert: a pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020.