(38f) Smart Synthetic Path Generation System with Analysis of Reaction Risk Enhanced By Automated Open Data Collection | AIChE

(38f) Smart Synthetic Path Generation System with Analysis of Reaction Risk Enhanced By Automated Open Data Collection

Authors 

Jeong, J. - Presenter, Myongji University
Shin, D., Myongji University
Kim, C. W., Myongji University
Yoon, E. S., Seoul National University
Chemical accidents often occur during laboratory experiments, pilot plant operations and abnormal reactor operations. To prevent these incidents, it is necessary to find relevant information before starting the synthesis experiment as well as process operations. In the process design stage, reaction information is essential to prevent reaction runaway. To get information, there are various sources of information related to synthesis, including the Internet and literature, and it takes a long time to search. In order to solve this problem, we propose an intelligent support system for synthetic path generation.

The DB used in the system is constructed by collecting open data about chemicals, synthesis, and safety (e.g., NFPA 704, GHS, MSDS, PAC, etc.). The data about chemical substances are available in freely accessible chemical DBs, which can be found in PubChem, ChEMBL, etc., and synthesis data can also be found through various sources. However, it is difficult to organize and search through this various set of data. For this reason, Web Scraping and Web Crawling techniques based on Python are used to explore, collect, organize, and obtain data from diverse sources. Since safety information offered by each Web site is different, safety data for chemicals are collected separately and made into a unified DB. The path generation algorithm is a combination of Depth-First Search and pruning algorithm. The suggested algorithm proceeds with the search based on the target substance and cuts off the child nodes if it is judged that there is no possibility for the child nodes. As a result, only the nodes that have possibility of synthesis are left, and paths are generated by connecting the nodes.

The results of generation show a synthetic path graph of various steps and information of each path for the target substance. This graph shows possible synthetic paths connecting from the target substance node to the starting material node and the user can easily assess the whole path. Synthesis data nodes containing hazardous chemicals are graded and displayed in a separate color (green, yellow or red depending on the severity). Each path includes data such as operating conditions and chemicals used for each synthesis method on the path. For each path where hazardous chemicals are used, additional information is displayed by analyzing the risk in the aspect of safety, health, and environment. After analyzing the synthesis data, the risk class of each synthetic path is classified and weighted for ranking.

The data in the built-in DB are continuously collected and automatically updated. First, about 100,000 synthesis data were obtained and the system test was conducted. We used Scifinder and SciPlanner of CAS to verify the data obtained from the test. Scifinder has a large DB based on many literatures and patents, so it is easy to search for synthetic information. As a result of using SciPlanner, there are few cases where there is no result corresponding to the proposed system’s results, but most of matching results could be found. It verifies that actually available paths are all generated using the system even if the size of DB becomes large. The system is being released as open source for free use. For the benefit of each research institution, researchers can register their private data and expand the DB according to the format type. It increases the likelihood of getting results that are tailored to their goals. Using the proposed system, it is expected that researchers will find a safer, reliable, and economical reaction pathway by referring to the generated results and help prevent accidents in experiments and eventual plant operations.