(375m) Leveraging Knowledge Synthesis and Transfer Learning in Photo-Bioproduction for Enhanced CO2 Capture and Upcycling | AIChE

(375m) Leveraging Knowledge Synthesis and Transfer Learning in Photo-Bioproduction for Enhanced CO2 Capture and Upcycling

Authors 

Li, W., Washington University in St. Louis
Long, B., Texas A&M University
Chen, Y., Washington University in St. Louis
Tang, Y., Washington University in St. Louis
The burgeoning field of synthetic biology offers promising avenues for environmental sustainability, particularly through photo-bioproduction processes that enable efficient CO2 capture and conversion into valuable biochemicals. However, the design-build-test-learn (DBTL) cycle in biosynthetic strain construction and metabolic engineering is hampered by high costs and slow experimental progress. Herein, we explore the integration of Machine Learning (ML) and Transfer ) as transformative approaches to accelerate and refine the predictive modeling of blue green algae (cyanobacteria) growth and product yield. By transferring published knowledge from model species to non-model species with limited bioproduction data with advanced computational methods, we would thereby inform and guide photobiorefinery development and cost estimations with unprecedented efficiency.

Focusing on a nonmodel cyanobacterium Synechococcus elongatus UTEX 2973 as the target strain, with model cyanobacterium species Synechocystis sp. PCC 6803 and S. elongatus PCC 7942 serving as information sources, we demonstrate the application of TL to bridge disparate data domains, facilitating robust predictions of growth and production outcomes without the need for extensive original data of a relatively newly discovered species. Our methodology encompasses the extraction and standardization of literature data and in-house experimental data. By Generative AI (specifically, GPT-4), we could expedite the data extraction from over 100 literature for feature selections and database construction.

Our findings reveal that TL, complemented by knowledge mining and feature engineering, significantly enriches the informational landscape, enabling the application of ML models to predict critical bioproduction metrics with limited training data, such as optical density (OD), growth rate, and product titer. From the experimented models by far, the highest r^2 score we have achieved is 0. Based on the results, AI identifies several features, such as Phosphate concentration, NO3 concentration, product biosynthesis pathways, and initial OD, as most related to the production metrics. This approach not only showcases the potential of TL in synthetic biology but also highlights the limitations and future directions for this technology, including the need for more sophisticated information filtering, feature extraction methods, and the exploration of TL's applicability across different species and photosynthetic families.

In conclusion, our study underscores the transformative potential of integrating TL with synthetic biology to streamline the DBTL cycle, thereby accelerating the development of sustainable bio-production processes. Future work will focus on enhancing model accuracy, data curation, and exploring the broader applicability of these methodologies across various domains of synthetic biology for environmental and industrial biotechnology.