Machine Learning for Predicting Cradle-to-Gate Life Cycle Inventory (LCI) Data | AIChE

Machine Learning for Predicting Cradle-to-Gate Life Cycle Inventory (LCI) Data

Improving the sustainability of chemical processes is crucial for mitigating the effects of climate change. One way this can be accomplished is through incorporating considerations for environmental impacts during the early stages of process design while modifications can still be easily made. Unfortunately, there are currently no simple methods for determining environmental impacts this early on, with most of them involving predictions based on molecular structure via large-scale simulations involving Density Functional Theory or Molecular Dynamics. These simulations are time-consuming, and therefore not very practical for use by design firms that often have limited time. By using Machine Learning (ML), it is possible to predict the environmental impacts of the chemicals used within these processes much faster and with much less computational expense, allowing for this data to be available to design engineers in a much more practical manner.

In this work, a ML algorithm was developed to predict cradle-to-gate life cycle inventory (LCI) data for both pre-existing and novel chemicals. The LCI data predicted by the algorithm focused on four different environmental metrics: human health (HH), ecosystem quality (EQ), global warming potential (GWP), and recourse utilization (RU). Two variants of the algorithm were developed: one employed Artificial Neural Networks (ANN), and the other employed eXtreme Gradient Boosting (XGBoost). Both variants used over 350 data points, split between a training, testing, and validation set. Data was sourced from EcoInvent, a chemical information database, and the feature set included 200 molecular descriptors and 16 thermodynamic properties for each datapoint. A stepwise feature selection process was used to reduce the number of features from 216 to 10. Following hyperparameter tuning, the performance of both variants was assessed on the test set using the R-squared and Root-Mean-Squared-Error (RMSE) values. Following this analysis, it was determined that the XGBoost variant was more effective for the HH, GWP, and EQ metrics, producing RMSE values equaling 2.73, 1.18, and 0.574, respectively. In addition to the algorithm itself, a case study representing the extraction of polyphenols from wine pomace using acetone solvent was also implemented to demonstrate the utility of the LCI prediction algorithm in enabling the creation of a detailed cradle-to-grave Life Cycle Analysis of the entire process.