(594f) Enhancing Compound-Protein Interaction Prediction with Confidence Assessment | AIChE

(594f) Enhancing Compound-Protein Interaction Prediction with Confidence Assessment

Authors 

Chinas Serrano, L. - Presenter, University of Toronto
Xu, Z., University of Toronto
Mahadevan, R., University of Toronto
In the rapidly evolving field of biochemical engineering, predicting functions arising from compound-protein interactions remains a critical challenge. Recent deep learning models have made significant strides in functional annotation, novel enzyme discovery, and metabolite identification. However, the complexity of compound-protein interactions persists. Compound-protein interaction (CPI) prediction grapples with intricacies due to sparse data sources, heterogeneity, and the complex nature of interactions. Our work addresses this challenge by introducing CPI-Pred, a versatile deep learning model specifically designed for prediction compound-protein interactions.

We assemble the largest kinetic parameter datasets, encompassing four critical kinetic parameters: the Michaelis-Menten constant (KM) containing ~85k datapoints, the enzyme turnover number (kcat) containing ~45k datapoints, the catalytic efficiency (kcat/KM) containing ~20k datapoints, and the inhibition constant (K­I) containing ~77k datapoints. These parameters are essential for understanding enzyme functionality within metabolic contexts and their regulation by compounds.

CPI-Pred combines novel compound representations, enzyme language models, and attention mechanisms. Compound representations are learned using message-passing neural network, capturing essential features of chemical compounds. Enzyme representations are extracted from state-of-the-art protein language models, encoding rich information about enzymes. Additionally, we incorporate novel sequence pooling and cross-attention mechanisms to enhance the model’s performance.

To address the inherent uncertainty in CPI predictions, we introduce a confidence predictor model. This auxiliary component assesses the confidence level associated with each interaction prediction. It evaluates factors such as data quality, model uncertainty, and input features, providing a confidence level score that quantifies the reliability of the CPI-Pred output.

Our model demonstrates robustness across diverse compound-protein interactions. By utilizing amino acid sequence and compound structure representations, CPI-Pred outperforms SOTA models on unseen compounds and dissimilar enzymes. The confidence predictor provides additional insights, allowing users to gauge the trustworthiness of individual predictions.

Our workflow holds promise for addressing various metabolic engineering challenges, including enzyme design, drug discovery, and personalized medicine. By combining CPI-Pred’s predictions with confidence assessments, researchers can make informed decisions and prioritize experimental validation. In summary, our integrated approach not only enhances prediction accuracy but also introduces a confidence assessment, bridging the gap between computational predictions and experimental validation.