(411a) Big Data from Sparse Data: Diverse Scientific Benchmarks Reveal Optimization Imperatives for Implicit Membrane Energy Functions
AIChE Annual Meeting
2019
2019 AIChE Annual Meeting
Topical Conference: Applications of Data Science to Molecules and Materials
Innovations in Methods of Data Science
Tuesday, November 12, 2019 - 3:30pm to 3:45pm
In computational protein structure prediction and design, energy functions discriminate non-native from near-native biomolecule conformations. These functions approximate the balance of enthalpic and entropic contributions to protein stability through mathematical models derived from thermodynamic and structural data. Over the past decade, an influx of high-resolution data paired with machine learning techniques has significantly improved the accuracy of soluble protein energy functions. However, membrane protein energy functions remain low-resolution because the experimental data are sparse, leading to overfitting. To overcome this challenge, we assembled a suite of 14 computational benchmark tests against experimental targets. The test probe various membrane protein energy function capabilities ranging from reproducing protein stabilities and orientations to accurately predicting the three-dimensional structures of monomeric proteins and protein complexes. To evaluate current performance, we ran the benchmark on Rosalind19: the current state-of-the-art implicit membrane protein energy function in Rosetta. The benchmark revealed areas for improvement including treatment of electrostatic interactions, representation of the interfacial head group region, and accounting for entropic effects on protein orientation. To enable easy, fast, and continuous of future energy functions, the benchmark suite is publicly available on both GitHub and the Rosetta Benchmark server. Ultimately, these benchmarks will enable big-data-style optimization of membrane energy functions, leading to improved membrane protein design capabilities.