(108e) Machine Learning Models for Rapid Compositional Quantification of Complex Multicomponent Mixtures Using Vibrational Spectroscopy | AIChE

(108e) Machine Learning Models for Rapid Compositional Quantification of Complex Multicomponent Mixtures Using Vibrational Spectroscopy

Authors 

Angulo Figueira, A. - Presenter, New York University
Aydil, E., New York University
Modestino, M., New York University
In recent years, the application of Machine Learning (ML) in chemical sciences has emerged as an attractive method for discovering new materials and reaction mechanisms, optimizing chemical reactors, and automating processes. In this work, we present the development of an analytical methodology based on multitarget regression machine learning models to determine the composition of multicomponent chemical mixtures via Fourier Transform Infrared (FTIR) absorption spectroscopy.

We use principal component regression (PCR) and Artificial Neural Network (ANN) algorithms to train and test models capable of determining the composition of aqueous solutions of unknown concentration. In our approach, we begin by systematically collecting infrared absorption data for training the algorithms. This training data is collected automatically with a set of programmable syringe pumps that prepare and deliver up to ternary aqueous solutions of known concentrations into an FTIR transmission cell where a spectrum from each sample is collected. We then evaluate the ability of commonly used machine learning models trained with this data to predict the concentrations of unknown mixtures. Specifically, machine learning models from publicly available libraries were trained using this data and evaluated for 1-, 2- and 3-components in an aqueous solution. As model complex systems, we used mixtures of components with similar chemical structures such as alcohols (i.e., glycerol, isopropanol, and 1-butanol) and nitriles (i.e., acrylonitrile (AN), adiponitrile (ADN), and propionitrile (PN)). These mixtures are relevant to emerging electrochemical synthesis processes based on glycerol electroreduction and ADN electrosynthesis. They are also challenging because they have similar but subtly different vibrational features, i.e., similar fingerprints. For PCR models, the coefficient of determination, R2, was 0.982 for the 3-component alcohol mixture and 0.976 for the 3 component nitrile mixture. For ANN models, the accuracy was slightly lower, with R2 equaling 0.978 and 0.936, respectively. These R2 correspond to mean absolute errors on the order of 0.07 – 0.24 % m/m for mixtures with component concentrations between 4-10 % m/m. These results suggest that the commonly used machine learning models are capable of and appropriate for determining the unknown composition of complex multicomponent mixtures with similar absorption features. They can potentially be implemented as rapid, in-line, non-invasive, chemical quantification tools for chemical reactor outflow streams analysis.