(375m) Leveraging Knowledge Synthesis and Transfer Learning in Photo-Bioproduction for Enhanced CO2 Capture and Upcycling
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computing and Systems Technology Division
Interactive Session: Data and Information Systems
Tuesday, October 29, 2024 - 3:30pm to 5:00pm
Focusing on a nonmodel cyanobacterium Synechococcus elongatus UTEX 2973 as the target strain, with model cyanobacterium species Synechocystis sp. PCC 6803 and S. elongatus PCC 7942 serving as information sources, we demonstrate the application of TL to bridge disparate data domains, facilitating robust predictions of growth and production outcomes without the need for extensive original data of a relatively newly discovered species. Our methodology encompasses the extraction and standardization of literature data and in-house experimental data. By Generative AI (specifically, GPT-4), we could expedite the data extraction from over 100 literature for feature selections and database construction.
Our findings reveal that TL, complemented by knowledge mining and feature engineering, significantly enriches the informational landscape, enabling the application of ML models to predict critical bioproduction metrics with limited training data, such as optical density (OD), growth rate, and product titer. From the experimented models by far, the highest r^2 score we have achieved is 0. Based on the results, AI identifies several features, such as Phosphate concentration, NO3 concentration, product biosynthesis pathways, and initial OD, as most related to the production metrics. This approach not only showcases the potential of TL in synthetic biology but also highlights the limitations and future directions for this technology, including the need for more sophisticated information filtering, feature extraction methods, and the exploration of TL's applicability across different species and photosynthetic families.
In conclusion, our study underscores the transformative potential of integrating TL with synthetic biology to streamline the DBTL cycle, thereby accelerating the development of sustainable bio-production processes. Future work will focus on enhancing model accuracy, data curation, and exploring the broader applicability of these methodologies across various domains of synthetic biology for environmental and industrial biotechnology.