(592c) Network for Knowledge Organization (NEKO): An AI Knowledge Mining Workflow for Synthetic Biology Literature Studies | AIChE

(592c) Network for Knowledge Organization (NEKO): An AI Knowledge Mining Workflow for Synthetic Biology Literature Studies

Authors 

Tang, Y., Washington University in St. Louis
Chen, Y., Washington University in St. Louis
The rapid growth of scientific literature presents a challenge for synthetic biology researchers to stay informed about the latest developments. Large language models (LLMs), like GPT-4, facilitate extensive text processing and knowledge extraction, yet they are constrained by their pretraining cut-off date and lack the ability to provide specific, cited scientific knowledge. Here, we introduce Network for Knowledge Organization (NEKO), a universal framework that uses LLM to extract knowledge from scientific literature, identify terminology entities, and establish causal relationships among them. When users input a keyword of interest, NEKO generates a knowledge graph and automatically exports a summary of relevant concepts. It has immediate applications in daily academic tasks such as education of young synthetic biology scientists, literature review, paper writing, and experiment planning/troubleshooting. We exemplified this workflow's applicability through several case studies.

Case study 1: Knowledge acquisition on β-Carotene Production in Y. lipolytica

NEKO can help researchers quickly acquire up-to-date knowledge about one research topic. To illustrate, NEKO rapidly read more than 200 research article PDF files and produce a knowledge base for the oleaginous yeast Y. lipolytica. When focusing on optimizing β-carotene production in Y, lipolytica, NEKO produced a detailed summary. Compared to ChatGPT-4 zero-shot Q&A, NEKO gave 3 times more gene targets, 200% more strain engineering strategies, and 57% more bioprocess cultivation conditions, with knowledge from 37 reliable peer-reviewed sources.

Case study 2: literature review of a non-model species Rhodosporidium toruloides

Literature review is one of the time-consuming academic tasks, and we demonstrate NEKO's quick application in review paper writing in this case study. Rhodosporidium toruloides is gaining research attention recently for its high lipid content and native carotenoid production. We applied NEKO to R. toruloides article abstracts (total 392 articles) on PubMed. By visualizing the knowledge graph and comparing with Y. lipolytica, R. toruloides literature is still at its early stages of research. The knowledge from each studies are still disconnected and independent from each other. Upon application, NEKO identified the top ten trending research areas, including genetic engineering, metabolic pathway, adaptive evolution experiments, and biochemical production. Moreover, NEKO enables users to refine their searches by specifying target products. For instance, focusing on lipid and carotenoid production would yield a detailed breakdown of the existing research landscape. Overall, this demonstrates NEKO's capacity to streamline the literature review process.

Case study 3: experiment planning and troubleshooting

Another information intensive academic task is experiment planning and troubleshooting. We illustrate this through NEKO's ability to synthesize experiment procedures into actionable insights. For example, when aiming to genetically engineer Y. lipolytica, users can obtain a comprehensive overview of the necessary procedures for strain transformation. NEKO enhances this process by providing detailed, step-by-step methodological guidance, by including specific DNA amount used, procedure name, name of chemical kit used and so on. Take another example, assume the user is having trouble expressing GGPPS, an essential enzyme in β-carotene synthesis pathway. NEKO can search for information regarding to GGPPS and compile a report with suggestions to troubleshoot this scenario, significantly more actionable and informative than GPT-4.

In conclusion, NEKO distinguishes itself from other knowledge-based Q&A platforms by overcoming the context length limitations of LLMs. All NLP tools used in this study are off-the-shelf, without the need for pretraining. NEKO offers flexible, lightweight local deployment options, compatible with open-source LLMs such as Qwen of varying parameter sizes, ensuring adaptability to diverse computation hardware. As academic research grows in complexity and increasingly spans multiple disciplines, researchers are investing substantial time in literature review and knowledge acquisition. This trend towards continuous learning alongside research and development represents a new common. NEKO not only empowers scientists with decision-making insights but also democratizes artificial intelligence tool, making science AI more accessible to researchers.