(519f) Light Olefin Ratio Prediction Using Data-Driven Model of Fischer-Tropsch Synthesis with Catalyst Descriptors | AIChE

(519f) Light Olefin Ratio Prediction Using Data-Driven Model of Fischer-Tropsch Synthesis with Catalyst Descriptors

Authors 

Jeong, W., Sungkyunkwan University
IM, H., Sungkyunkwan University
Lee, Y. S., Sungkyunkwan University
Lee, J., Sungkyunkwan University
Kim, J., Incheon National University
Light olefins such as ethylene and propylene are key materials to produce petrochemical products such as plastics and rubber. One of the production methods of light olefins is Fischer-Tropsch (FT) synthesis which is considered an environmental alternative as it can reduce the amount of CO2. Iron-based catalysts are mainly used in the industrial scale of FT synthesis due to the strong catalytic activity for the water-gas-shift (WGS) reaction. WGS reaction converts CO and H2O into CO2 and H2 so that a low H2/CO ratio of syngas can be utilized. Also, iron-based catalysts have a potential for higher selectivity to light olefins since this catalyst has lower hydrogenation activity than other FT catalysts. Currently, many studies in the literature report the effects of the experimental difference of operating conditions or catalyst composition on CO conversion and selectivity of light olefins. However, studies related to the effect of experimental conditions on the ratio of light olefins are still lacked. These works are also limited in revealing crucial factors that influence the olefin ratio because differences in experimental conditions from various research groups lead to a significant gap in data. To address the significant gap, SWED (Sorted Weighted Elemental Descriptor representation) method developed by Shimizu is adopted. In addition, the artificial intelligence-based model is proposed to predict the ratio of olefin to paraffin during the iron-based catalyst FT synthesis reaction to reveal the influence of factors and correlation. First, we collect data of catalyst and operation conditions of FT synthesis from the literature. The SWED can separate the data into the elemental properties of catalysts and extend the dimensions of the data so that it enables to compare catalysts in an identical system. Next, 7 different machine learning (ML) models are adopted to compare the accuracy of each model: lasso, ridge, extra gradient boosting (XGBoost), artificial neural networks (ANNs), kernal ridge (KRR), random forest regressor (RFR), extra tree regressor (ETR). Finally, high accuracy models are selected to predict the ratio of light olefins. This study can help design of subsequent experiments to expand into the yield of light olefins with revised prediction method.

First, 474 total data points are collected from 47 papers presented in the literature. 264 data points contain iron and 131 data points are finally selected after the elimination of the outliers. To preprocess the collected data, we use KNN-imputer to replace the missing points of data. Input variables are composed of operating conditions of FT synthesis (e.g. Pressure, Temperature, Gas Hourly Space Velocity, Syngas ratio), the catalyst performance (e.g. Catalyst mass, Specific area BET), composition of catalyst and supporter and catalyst properties calculated by the SWED. This method is adopted to compare the catalysts which had diverse compositions in same system. Each catalyst is represented as the value multiplied by the weight ratio and descriptors of individual elements. Individual elements in each catalyst are arranged in the highest order of the weight ratio. Elemental descriptors for the input variables are the following 9 physical properties: group, atomic weight (AW), melting point (MP), electronegativity (EN), fusion enthalpy, density, band gap (BG), oxidation number and ionization energy (IE). The selectivity of C2-C4 and the ratio of olefin and paraffin are target variables, respectively.

In this study, to analyze the effect to increase values of target variables which is the key objective, 7 ML models are adopted: lasso, ridge, XGBoost, ANNs, KRR, RFR and ETR. The data set is shuffled randomly and divided into five identical groups. One group is used as test data, and the others are used as train data. Bayesian optimization (BO) optimizes the hyperparameters of the ML model with randomly selected train and test data. The objective function of the optimization is to minimize mean square error (MSE). Each best model is built by the best hyperparameters with BO results. Data shuffling and splitting to conduct the 5-fold cross-validation are iterated. Four subsets of data are fitted to the train data by best ML models to predict values for the test data. We evaluate the accuracy of best models between predicted values and test values by using the root mean square error (RMSE) and R2-score. RFR and XGB which perform relatively low error values are selected. ETR also performs almost zero value of RMSE in the train data, however, high error value in the test data. These train data are overfitted to the model so that it is difficult to predict target values with the test data.

Prediction with RFR and XGB model regarded as ‘black box model’ contains inexplicable problems. To overcome the problems of models, the shapley additive explanations (SHAP), accumulated local effects (ALE) and feature importance plot are used to explain the ambiguity. The SHAP method can interpret the correlation between input factors and result factors partially. To detect input features correlation with result values, partial dependence plot (PDP) is commonly used, however, PDP is heavily influenced by other features to specify the correlation. In contrast, ALE considers the relation among input variables to interpret model better. Feature importance can reveal the correlation of the input features how to predict the result values. In conclusion, SHAP and feature importance in both models similarly suggest pressure and temperature show the highest tendency in prediction of the result. In addition, the correlation between supporter and catalyst is identified with catalyst descriptors that were not previously considered to use explainable methods. By applying catalyst descriptors into the input data of ML models, the study of increasing the yield of light olefins will be accelerated.

In this study, we suggest ML based framework of Fischer-Tropsch synthesis from catalyst descriptors to predict the ratio of light olefins. The methodology of the catalyst descriptor addition can compare the data in the identical system and increase the accuracy of the ML models. RFR and XGB based models are proposed to predict the ratio of light olefins. Also, interpretation methods of models reveal the crucial influence and correlation with the input and target variables. Therefore, this framework can be the solution to reduce the effort of identifying crucial factors that influence the ratio of light olefins through experiments. For instance, this study provides a new way of light olefins yield prediction with this framework.

Reference

  1. Mine, S., M. Takao, T. Yamaguchi, T. Toyao, Z. Maeno, S. M. A. H. Siddiki, S. Takakusagi, K. Shimizu, and I. Takigawa, “Analysis of Updated Literature Data up to 2019 on the Oxidative Coupling of Methane Using an Extrapolative Machine-Learning Method to Identify Novel Catalysts”, ChemCatChem, 13, pp. 3,636-3,655 (2021).
  2. Suzuki, K., T. Toyao, Z. Maeno, S. Takakusagi, K. Shimizu, and I. Takigawa, “Statistical Analysis and Discovery of Heterogeneous Catalysts Based on Machine Learning from Diverse Published Data”, 11, ChemCatChem, pp.4,537-4,547 (2019).
  3. Yuan, Z., Y. Wang, L. Zhu, C. Zhang, and Y. Sun, “Advancing C5+ hydrocarbons fuels production: An interpretable machine learning framework for Co-catalyzed syngas conversion”, Fuel, 361, 130658 (2024).
  4. Garona, H. A., F. M. Cavalcanti, T. F. de Abreu, M. Schmal, and R. M. B. Alves, “Evaluation of Fischer-Tropsch synthesis to light olefins over Co- and Fe-based catalysts using artificial neural network”, Journal of Cleaner Production, 321, 129003 (2021).