(345h) Practical Predictive Models of Flammability Properties Established By Machine Learning Techniques to Aid in Incorporating Inherently Safer Design | AIChE

(345h) Practical Predictive Models of Flammability Properties Established By Machine Learning Techniques to Aid in Incorporating Inherently Safer Design

Recently, process industry has attempted to incorporate inherently safer design (ISD) during early design stages. Understanding flammability properties of chemicals is fundamental to adopting ISD, especially for combustion incidents, because combustion incidents are among the most commonly occurring incident types in the process industry. Since flammable chemicals such as fuels, solvents, and raw materials are indispensable in the process industry, finding sensible strategies for handling flammability has been an ongoing topic of investigation. However, flammability properties for specific chemicals are not always available from the experimental data. This is because numerous chemicals are newly synthesized in the chemical process industry and carrying out experiments for toxic, explosive, or radioactive compounds is extremely difficult and costly.

Alternatively, studies have been geared toward predicting chemical flammability properties in a practical manner. The three presentative approaches to establish predictive models are: (1) a physical property model, (2) a group contribution model, and (3) a quantitative structure-property relationship (QSPR) model. Predictive models were actively proposed in physical property model approaches between the late 1950s and the mid-1990s. Then, more complicated and accurate approaches, which were group contribution models and QSPR models, were actively proposed from 2000 onward.

It is, however, often challenging for practitioners to estimate flammability properties with currently existing predictive models. Many practitioners' main challenge is the demand for in-depth chemical or computer science knowledge associated with the predictive models. Therefore, this study aimed to propose new reliable predictive models that practitioners can easily adopt.

This study proposes machine learning-based models, which predict four flammability properties of pure organic compounds: the flash point, heat of combustion, lower flammability limit (LFL), and upper flammability limit (UFL). Based on the data obtained from the DIPPR (2019), this study established MLR models that predict these four flammability properties. To utilize readily available predictor variables, the numbers of all atoms from their molecular formulas, molecular weights, and boiling points were adopted as 121 default input variables (possible predictors). Of various machine learning algorithms, the MLR algorithm was selected because the algorithms' models would enable process engineers to adopt them quickly. However, the MLR method was generally limited to linear fitting. To overcome this MLR limitation, atomic interactions and transformation terms were additionally prepared by transforming the default input variables. This study consisted of two steps to clearly compare the effects of atomic interactions and transformation terms.

These flammability properties pose a strong impact on the inherently safer design of industrial processes. Similar to quantitative structure-property relationship (QSPR) or group contribution models, machine learning algorithms and statistical parameters are utilized in this study to establish predictive models. However, compared to the QSPR and group contribution models, this study uses thoroughly, readily available variables (the numbers of atomic elements, molecular weights, and normal boiling points) as predictors. This study consists of two steps: Step 1) building multiple linear regression (MLR) models by incorporating default input variables and Step 2) building MLR models incorporating interaction and transformed predictor variables to improve the predictions from the models in Step 1.

In Step 1, an optimal subset from the 121 variables, which are simply linear terms, was selected with the SFBS feature selection algorithm. Based on the selected predictors, the SFBS-MLR-L models were created for predicting each flammability property. After analyzing the four created models, the flash points and the heats of combustion of organic compounds could be estimated with the SFBS-MLR-L, with the values of as 0.976 and 0.999 for test sets, respectively. In contrast to the former models, the LFL and UFL SFBS-MLR-L models were not applicable, with the value of R^2as 0.552 and 0.253 for test sets, respectively. These observations implied that the combinations of simple linear terms were reasonably sufficient at predicting the flash points and the heats of combustion but were not applicable for predicting the LFL and UFL values.

In Step 2, the default variables in Step 1 were transformed to incorporate the nonlinear and interaction terms together. With the SFFS feature selection, an optimal subset of predictors was selected in Step 2. Similar to Step 1, based on the selected predictors, the SFFS-MLR-NLI models were constructed for each flammability property. Although the models were constructed by the SFBS in Step 2, the SFBS-MLR-NLI models generally showed better accuracies than the SFFS-MLR-NLI models except for heat of combustion prediction. Since the SFBS-MLR-NLI models included numerous predictors, they were not appealing for applications. Therefore, for simplicity and generalization, the SFFS-MLR-NLI models were primarily presented in Step 2. All of the four SFFS-MLR-NLI models were valid, with the values of R^2 as 0.977 (for the flash point), 0.999 (for the heat of combustion), 0.741 (for the LFL), and 0.501 (for the UFL) for each test set. All of the constructed models in Step 2 were adequate as predictive models of the four flammability properties. Afterward, more flexible SFBS-MLR-NLI models of the four flammability properties were additionally presented in Appendices A – D. The performances of the SFFS-MLR-NLI and SFBS-MLR-NLI models were then compared with published QSPR models. The comparisons with the QSPR models revealed that, although the sample size and range of target properties were more extensive, the proposed models in Step 2 performed similarly or superior to the QSPR models regarding their predictabilities. Given that the proposed models’ accuracies had a considerably similar accuracy as the QSPR models, we concluded that the prepared variables of Step 2 successfully described the various molecular structures to capture the characteristics of flammability properties like the QSPR approaches.

Through the SFBS in Step 1, the SFFS in Step 2, and the SFBS in Step 2, the constructed models were gradually enhanced. Since the flash point SFBS-MLR-L models were already remarkable, there was no significant increase in these models' predictabilities in Step 2. The heat of combustion’s SFFS-MLR-NLI model proved exceptionally more accurate and generalizable than its corresponding SFBS-MLR-NLI model due to an extraordinary outlier on the training set. This result indicated that the linear terms with several interaction terms were enough to predict the heat of combustion, and the SFBS-MLR-NLI overfitted the data. Meanwhile, the LFL and UFL models showed substantial improvement in Step 2. Hence, this result indicated that the values of LFL and UFL were determined by various atomic interactions and bulk-related factors, such as the Van der Waals force and steric effects. The heats of combustion of organic compounds were mainly associated with atomic types.

In conclusion, as all the results discussed above showed, the proposed regression models in this study possess some apparent advantages. First, the number of predictive variables used was readily available. Second, the proposed predictive models are both practical and sufficiently accurate to practitioners. Third, the proposed MLR models will be favored for their ease of interpretability. Therefore, the chemical engineering community could benefit from these results, which would give practitioners the possibility to quickly estimate missing data of the flash point, heat of combustion, LFL, and UFL of organic compounds.