(443d) A Machine Learning Based Algorithm for Rate Estimation
AIChE Annual Meeting
2019
2019 AIChE Annual Meeting
Topical Conference: Applications of Data Science to Molecules and Materials
Applications of Data Science in Catalysis and Reaction Engineering II
Wednesday, November 13, 2019 - 8:54am to 9:12am
A Machine Learning Based Algorithm for
Rate Estimation Matthew
S. Johnson1 and William H. Green1,* 1Dept. Chemical Engineering,
Massachusetts Institute of Technology, Cambridge,
MA-02139 â USA
chemical kinetic mechanisms are vital to understanding many complex chemical processes
but can involve the estimation of >100,000âs of reaction rates, far too many
to determine from experiments or quantum chemical methods. Modern ab initio methods
can now provide us with enough kinetic data to approach this problem from a
machine learning perspective. However, rate estimation is highly nonlinear and
kinetic datasets are very sparse. For these reasons we need a demanding
combination of properties which have previously been unattainable from conventional
machine learning approaches: nonlinear, easy to diagnose, and capable of
incorporating qualitative chemical knowledge and uncertainty estimation. We
have developed a subgraph-isomorphic decision-tree based method that has all of
these properties. The only inputs for construction are a set of kinetic data
for training and a starting reaction template. Each node of the tree contains
the set of training reactions matching the corresponding reaction template.
Their rate coefficients, k, are treated as samples from a lognormal rate
distribution. Each node is branched into sub-nodes using more specific
reaction templates chosen to minimize the sum of the standard deviations in
log(k). Subgraph properties that do not affect this sum (usually because the
training reactions are too homogeneous) are identified and adjusted based on a
chemical knowledge based algorithm. Rates and uncertainties can then be
estimated treating each reaction rate as a sample of the distributions at each
node. A comparison was run against data and estimators in the Reaction
Mechanism Generator (RMG) database. Using this technique, training on the
RMG-database training sets, decision trees were generated for the Subsitution_O
(127 reactions) and intra_H_migration (422 reactions) reaction types. Comparing
against RMGâs rate rules at T=1000 K the decision tree estimator reduced
2-sigma errors by a factor of 18 and 45 respectively. Furthermore, median
errors lumping all error into the activation energy were reduced from 5.1 to
3.1 kcal/mol and from 4.0 to 2.9 kcal/mol respectively.
Rate Estimation Matthew
S. Johnson1 and William H. Green1,* 1Dept. Chemical Engineering,
Massachusetts Institute of Technology, Cambridge,
MA-02139 â USA
Large
chemical kinetic mechanisms are vital to understanding many complex chemical processes
but can involve the estimation of >100,000âs of reaction rates, far too many
to determine from experiments or quantum chemical methods. Modern ab initio methods
can now provide us with enough kinetic data to approach this problem from a
machine learning perspective. However, rate estimation is highly nonlinear and
kinetic datasets are very sparse. For these reasons we need a demanding
combination of properties which have previously been unattainable from conventional
machine learning approaches: nonlinear, easy to diagnose, and capable of
incorporating qualitative chemical knowledge and uncertainty estimation. We
have developed a subgraph-isomorphic decision-tree based method that has all of
these properties. The only inputs for construction are a set of kinetic data
for training and a starting reaction template. Each node of the tree contains
the set of training reactions matching the corresponding reaction template.
Their rate coefficients, k, are treated as samples from a lognormal rate
distribution. Each node is branched into sub-nodes using more specific
reaction templates chosen to minimize the sum of the standard deviations in
log(k). Subgraph properties that do not affect this sum (usually because the
training reactions are too homogeneous) are identified and adjusted based on a
chemical knowledge based algorithm. Rates and uncertainties can then be
estimated treating each reaction rate as a sample of the distributions at each
node. A comparison was run against data and estimators in the Reaction
Mechanism Generator (RMG) database. Using this technique, training on the
RMG-database training sets, decision trees were generated for the Subsitution_O
(127 reactions) and intra_H_migration (422 reactions) reaction types. Comparing
against RMGâs rate rules at T=1000 K the decision tree estimator reduced
2-sigma errors by a factor of 18 and 45 respectively. Furthermore, median
errors lumping all error into the activation energy were reduced from 5.1 to
3.1 kcal/mol and from 4.0 to 2.9 kcal/mol respectively.