(519f) Light Olefin Ratio Prediction Using Data-Driven Model of Fischer-Tropsch Synthesis with Catalyst Descriptors
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computing and Systems Technology Division
Industrial applications in Intelligent Operations
Wednesday, October 30, 2024 - 1:54pm to 2:15pm
First, 474 total data points are collected from 47 papers presented in the literature. 264 data points contain iron and 131 data points are finally selected after the elimination of the outliers. To preprocess the collected data, we use KNN-imputer to replace the missing points of data. Input variables are composed of operating conditions of FT synthesis (e.g. Pressure, Temperature, Gas Hourly Space Velocity, Syngas ratio), the catalyst performance (e.g. Catalyst mass, Specific area BET), composition of catalyst and supporter and catalyst properties calculated by the SWED. This method is adopted to compare the catalysts which had diverse compositions in same system. Each catalyst is represented as the value multiplied by the weight ratio and descriptors of individual elements. Individual elements in each catalyst are arranged in the highest order of the weight ratio. Elemental descriptors for the input variables are the following 9 physical properties: group, atomic weight (AW), melting point (MP), electronegativity (EN), fusion enthalpy, density, band gap (BG), oxidation number and ionization energy (IE). The selectivity of C2-C4 and the ratio of olefin and paraffin are target variables, respectively.
In this study, to analyze the effect to increase values of target variables which is the key objective, 7 ML models are adopted: lasso, ridge, XGBoost, ANNs, KRR, RFR and ETR. The data set is shuffled randomly and divided into five identical groups. One group is used as test data, and the others are used as train data. Bayesian optimization (BO) optimizes the hyperparameters of the ML model with randomly selected train and test data. The objective function of the optimization is to minimize mean square error (MSE). Each best model is built by the best hyperparameters with BO results. Data shuffling and splitting to conduct the 5-fold cross-validation are iterated. Four subsets of data are fitted to the train data by best ML models to predict values for the test data. We evaluate the accuracy of best models between predicted values and test values by using the root mean square error (RMSE) and R2-score. RFR and XGB which perform relatively low error values are selected. ETR also performs almost zero value of RMSE in the train data, however, high error value in the test data. These train data are overfitted to the model so that it is difficult to predict target values with the test data.
Prediction with RFR and XGB model regarded as âblack box modelâ contains inexplicable problems. To overcome the problems of models, the shapley additive explanations (SHAP), accumulated local effects (ALE) and feature importance plot are used to explain the ambiguity. The SHAP method can interpret the correlation between input factors and result factors partially. To detect input features correlation with result values, partial dependence plot (PDP) is commonly used, however, PDP is heavily influenced by other features to specify the correlation. In contrast, ALE considers the relation among input variables to interpret model better. Feature importance can reveal the correlation of the input features how to predict the result values. In conclusion, SHAP and feature importance in both models similarly suggest pressure and temperature show the highest tendency in prediction of the result. In addition, the correlation between supporter and catalyst is identified with catalyst descriptors that were not previously considered to use explainable methods. By applying catalyst descriptors into the input data of ML models, the study of increasing the yield of light olefins will be accelerated.
In this study, we suggest ML based framework of Fischer-Tropsch synthesis from catalyst descriptors to predict the ratio of light olefins. The methodology of the catalyst descriptor addition can compare the data in the identical system and increase the accuracy of the ML models. RFR and XGB based models are proposed to predict the ratio of light olefins. Also, interpretation methods of models reveal the crucial influence and correlation with the input and target variables. Therefore, this framework can be the solution to reduce the effort of identifying crucial factors that influence the ratio of light olefins through experiments. For instance, this study provides a new way of light olefins yield prediction with this framework.
Reference
- Mine, S., M. Takao, T. Yamaguchi, T. Toyao, Z. Maeno, S. M. A. H. Siddiki, S. Takakusagi, K. Shimizu, and I. Takigawa, âAnalysis of Updated Literature Data up to 2019 on the Oxidative Coupling of Methane Using an Extrapolative Machine-Learning Method to Identify Novel Catalystsâ, ChemCatChem, 13, pp. 3,636-3,655 (2021).
- Suzuki, K., T. Toyao, Z. Maeno, S. Takakusagi, K. Shimizu, and I. Takigawa, âStatistical Analysis and Discovery of Heterogeneous Catalysts Based on Machine Learning from Diverse Published Dataâ, 11, ChemCatChem, pp.4,537-4,547 (2019).
- Yuan, Z., Y. Wang, L. Zhu, C. Zhang, and Y. Sun, âAdvancing C5+ hydrocarbons fuels production: An interpretable machine learning framework for Co-catalyzed syngas conversionâ, Fuel, 361, 130658 (2024).
- Garona, H. A., F. M. Cavalcanti, T. F. de Abreu, M. Schmal, and R. M. B. Alves, âEvaluation of Fischer-Tropsch synthesis to light olefins over Co- and Fe-based catalysts using artificial neural networkâ, Journal of Cleaner Production, 321, 129003 (2021).