(463c) Risk-Centric Optimal Synthetic-Pathway Generation System Using Monte Carlo Tree Search and Open Data of Reactions and Material Safety | AIChE

(463c) Risk-Centric Optimal Synthetic-Pathway Generation System Using Monte Carlo Tree Search and Open Data of Reactions and Material Safety

Authors 

Lee, N. - Presenter, MyongJi University
Shin, D., Myongji University
Jeong, J., Myongji University
In R&D, Acquainting chemical engineering data such as chemicals, reactions, safety and danger is considered to be an essential part of the process and experiment design. Today, Although a number of data related to chemical engineering are operated privately or publicly(accessible on the Internet), data is not only distributed in a wide range or differ in format and also has various types of reactants used in each reaction path, making it difficult to select the suitable path. Thereby this leads in a small scale to a number of trial and error, and in a big scale to economics loss because of wasting materials and time, moreover, Serious safety problems. Therefore, It is an essential part in the planning stage before the research is conducted that the appropriate synthetic pathway design which can prevent safety accidents.

Currently, there is a commercial S/W reaction path design such as Chematica, but it is difficult to use universally in cost aspects(high cost) and specific conditions(cost optimization). Therefore, this study proposes a system that generates a path optimized for safety with Graph algorithm considering reaction energy and material stability(or risk). Concretely, this study proposes a smart path search/generation system based on the automatic collection of open source and open data using database construction and Monte Carlo Tree Search(MCTS).

In the data mining step, we conducted automatic Web Crawling and Web Scraping which can be continuously updated to build system DB, and collected public data such as Google patents, PubChem and ChemSpider. So far, we have collected a large amount of synthetic data, including about 4.8 million compound data and about 2.2 millions synthetic data.

Then, After the data which collected for the synthetic pathway design was constructed as a data networks, apply them to the search algorithm. In general path search algorithms such as Depth First Search (DFS) and Breath First Search (BFS), the time complexity is O(|V|+|E|) when implemented as an adjacency list, and the time complexity is O(|V|^ 2) when implemented as an adjacency matrix. This because there is an explosive increase in the operation speed in proportion to the increase in data. In order to improve this, this study tried to shorten the operation time by applying MCTS algorithm. The synthetic pathway is proposed by searching for which path is generated in a specified depth through data such as reaction, risk, energy, etc. At this time, the weight of MCTS is determined based on degree of risk of reaction, reactant and product. The risk of the reaction that can be derived from the enthalpy change are calculated using the Benson Group Increment Theory(BGIT). Hazard of the reactant and product is classified based on the related public data like NFPA 704 code, Protective Action Criterias(PACs), Acute exposure Guideline Levels(AEGLs) and Emergency Response Planning Guidelines(ERPGs). Consequently, the generated synthetic pathways are represented as Knowledge graph and string data tree types. The result of this study are as follows: 1) Knowledge graph results are classified by color. 2) String result shows the synthesis information, chemicals and risk each node’s paths and paths.

The system driving test was conducted using some DB data. And compared with other S/W that shows suitable results in a specific condition or designs a direct synthetic pathway, the proposed system has advantage in automatically generating open-source-synthetic-pathways applicable to a wide range based on large open data. Moreover, It is expected highly meaningful about safety through the proposed system that helps to prevent safety accidents and efficiently manage additional information when we refer to the path generated.