(252g) Integration of Matrix Encoding in Transformer Model for Reaction Pathway Prediction in Oil Refining Process | AIChE

(252g) Integration of Matrix Encoding in Transformer Model for Reaction Pathway Prediction in Oil Refining Process

Authors 

Li, J., The University of Manchester
Due to the foreseeable shortage of fossil fuels, increasing global energy demand, and growing environmental concerns, effective utilization of fossil fuels with the concept of 'molecular management' has attracted increasing attention from researchers. Molecular management targets the right molecules to be at the right place, at the right time and at the right price. Molecular management involves a range of techniques, including molecular characterisation of refining streams, molecular modelling to optimise refining processes, and integration of processing and utility systems at the molecular level, to achieve overall refinery optimisation. By leveraging these advanced techniques, it is possible to maximize the efficiency of fossil fuel usage while minimizing its impact on the environment.
In our last works, we developed a method that automatically converts molecular structure (represented using SMILES, simplified molecular-input line-entry system) to a list of submatrices that represents the connectivity information of each substructure in the molecule, then structure-property relationship model was then developed using neural networks. Despite the molecular property prediction model, a molecule composition reconstruction model that was developed. The surrogate composition model using SMILES as basic representation unit, homologous series of molecules is generated by configuring the sidechain structure to the core structures. To ensure molecular diversity, a random distribution is used, while to reduce the degree of freedom of the model, a probability density distribution is adopted. A hybrid optimization algorithm consist of genetic algorithm and Sequential Least Squares Programming (SLSQP) algorithm is used to increase the numerical robustness and reproducibility of the results. The composition reconstruction model was evaluated by applying it in two case studies including petroleum fractions and biomass pyrolysis oil.
The Transformer model is a type of neural network architecture that can process sequences of input data, such as natural language sentences or audio signals, and produce corresponding outputs. It has shown significant improvements in many natural language processing tasks, such as machine translation. Chemical reaction prediction can also be treated as a machine translation problem between SMILES strings of reactants, and products, which has been investigated in the past few years.[1-5] However, the encoding of molecular structures is different to natural language, the SMILES string structure representation can be difficult to interpret, and sometimes the generated SMILES strings of products can not be canonicalized. Therefore, we tried to integrate the structural submatrices representation method into the transformer reaction prediction model to improve production accuracy. A reaction database covers petroleum refining process including cracking, substitution, Isomerization, etc., is used to develop the reaction prediction model. The results show the prediction accuracy has increased significantly over the traditional template-based methods. In addition, the generalizability of the transformer model greatly simplifies the process.

References

[1] D.M. Lowe, Extraction of chemical structures and reactions from the literature, University of Cambridge, 2012.

[2] C.W. Coley, R. Barzilay, T.S. Jaakkola, W.H. Green, K.F. Jensen, Prediction of Organic Reaction Outcomes Using Machine Learning, ACS Central Science 3(5) (2017) 434-443.

[3] P. Schwaller, T. Laino, T. Gaudin, P. Bolgar, C.A. Hunter, C. Bekas, A.A. Lee, Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction, ACS Central Science 5(9) (2019) 1572-1583.

[4] M.H.S. Segler, M.P. Waller, Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction, Chemistry – A European Journal 23(25) (2017) 5966-5971.

[5] P.P. Plehiers, C.W. Coley, H. Gao, F.H. Vermeire, M.R. Dobbelaere, C.V. Stevens, K.M. Van Geem, W.H. Green, Artificial Intelligence for Computer-Aided Synthesis In Flow: Analysis and Selection of Reaction Components, Frontiers in Chemical Engineering 2 (2020).