(463c) Risk-Centric Optimal Synthetic-Pathway Generation System Using Monte Carlo Tree Search and Open Data of Reactions and Material Safety
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Sustainable Engineering Forum
Poster Session: Sustainability and Sustainable Biorefineries
Thursday, November 19, 2020 - 8:00am to 9:00am
Currently, there is a commercial S/W reaction path design such as Chematica, but it is difficult to use universally in cost aspects(high cost) and specific conditions(cost optimization). Therefore, this study proposes a system that generates a path optimized for safety with Graph algorithm considering reaction energy and material stability(or risk). Concretely, this study proposes a smart path search/generation system based on the automatic collection of open source and open data using database construction and Monte Carlo Tree Search(MCTS).
In the data mining step, we conducted automatic Web Crawling and Web Scraping which can be continuously updated to build system DB, and collected public data such as Google patents, PubChem and ChemSpider. So far, we have collected a large amount of synthetic data, including about 4.8 million compound data and about 2.2 millions synthetic data.
Then, After the data which collected for the synthetic pathway design was constructed as a data networks, apply them to the search algorithm. In general path search algorithms such as Depth First Search (DFS) and Breath First Search (BFS), the time complexity is O(|V|+|E|) when implemented as an adjacency list, and the time complexity is O(|V|^ 2) when implemented as an adjacency matrix. This because there is an explosive increase in the operation speed in proportion to the increase in data. In order to improve this, this study tried to shorten the operation time by applying MCTS algorithm. The synthetic pathway is proposed by searching for which path is generated in a specified depth through data such as reaction, risk, energy, etc. At this time, the weight of MCTS is determined based on degree of risk of reaction, reactant and product. The risk of the reaction that can be derived from the enthalpy change are calculated using the Benson Group Increment Theory(BGIT). Hazard of the reactant and product is classified based on the related public data like NFPA 704 code, Protective Action Criterias(PACs), Acute exposure Guideline Levels(AEGLs) and Emergency Response Planning Guidelines(ERPGs). Consequently, the generated synthetic pathways are represented as Knowledge graph and string data tree types. The result of this study are as follows: 1) Knowledge graph results are classified by color. 2) String result shows the synthesis information, chemicals and risk each nodeâs paths and paths.
The system driving test was conducted using some DB data. And compared with other S/W that shows suitable results in a specific condition or designs a direct synthetic pathway, the proposed system has advantage in automatically generating open-source-synthetic-pathways applicable to a wide range based on large open data. Moreover, It is expected highly meaningful about safety through the proposed system that helps to prevent safety accidents and efficiently manage additional information when we refer to the path generated.