VLE-ML: Prediction of Vapor-Liquid Phase Equilibrium Based on Molecular Descriptors with No Code Machine Learning
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Annual Student Conference: Competitions & Events
Undergraduate Student Poster Session: Computing and Process Control
Monday, October 28, 2024 - 10:00am to 12:30pm
The research followed these experimental steps: First, extensive data collection was conducted using robust databases like CHERIC, peer-reviewed scientific articles, and the NIST ThermoData engine in Aspen Plus. We collected experimental data for 562 compound pairs, which amounted to 27,349 unique equilibrium datapoints.
To enhance model performance, dimensionality reduction techniques like Principal Component Analysis (PCA) and feature selection by correlation were applied. This reduction decreased the input features from 208 to 53 while retaining 80% of the original data variance. The data was then split into training (80%) and test (20%) sets which were split by compound pairs to preserve independence. Batch cross-validation was employed to optimize the models and ensure further generalization capacity.
Several models were trained and optimized, including Linear Regression, Random Forest, XGBoost, and Artificial Neural Networks (ANNs). A grid search was conducted to optimize the hyperparameters of each model. The models were evaluated using root mean square error (RMSE), mean absolute error (MAE), and the coefficient of determination (R²). Additionally, the results were compared with the UNIFAC model, a commonly used method for VLE diagram prediction.
The results indicated that Artificial Neural Networks (ANNs) achieved the best performance among the evaluated models. While the UNIFAC model was efficient for common compounds, it exhibited significant errors for rare molecules. Linear Regression had the worst performance, indicating the problem's non-linearity. Random Forest showed poor performance, whereas XGBoost demonstrated a marked improvement over Random Forest. Neural Networks outperformed all models, effectively handling the problem's non-linearities.
This research demonstrates that machine learning models, especially neural networks, can substantially improve VLE diagram prediction. Moreover, using codeless tools like RapidMiner facilitated the implementation and optimization of these models, making the process accessible to the academic community.