Optical Molecular Recognition from Chemical Reaction Mechanism Images
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Annual Student Conference: Competitions & Events
Undergraduate Student Poster Session: Computing and Process Control
Monday, October 28, 2024 - 10:00am to 12:30pm
We propose a novel pipeline that combines an image-segmentation model and an OCSR model in dealing with information extraction from chemical reaction mechanisms, which typically contain rich information about electron flow. The image-segmentation model is used for data pre-treatment to identify and remove curved arrows indicating electron movement, followed by the OCSR model in molecular identity recognition. In addition, there are no existing benchmarking datasets designed specifically to target chemical reaction mechanisms. We created a dataset of molecule images and structural Molecular identifiers of Molecular images in Chemical Reaction Mechanisms (SMiCRM). It consists of 453 molecular images, with mechanistic features, such as curved arrows and partial charges. They are labeled with their Simplified Molecular Input Line Entry System (SMILES) and their Structural Data Files (SDFs). Comparing the performance of the molecular recognition accuracy using the proposed pipeline and the performances of using only the OCSR model on the dataset collected, the proposed pipelineâs performances significantly improved from 12.09% to 64.5% in exact SMILES matching and from 15.36% to 90.97% in Tanimoto similarity in recognizing the identity of mechanistic molecules of curved arrows removed. We also used this approach for in complete reaction mechanism image parsing and demonstrate positive performance improvements.
In conclusion, the research proposes an autonomous and effective procedure for collecting molecular-level information for chemical reaction mechanisms and highlights the further avenues for improving chemical information extraction methodologies.