(214e) Prediction of Protein Solubility in Escherichia Coli Using Discriminant Analysis, Logistic Regression, and Artificial Neural Network Models | AIChE

(214e) Prediction of Protein Solubility in Escherichia Coli Using Discriminant Analysis, Logistic Regression, and Artificial Neural Network Models

Authors 

Bagajewicz, M. J. - Presenter, The University of Oklahoma
Lennarson, R. - Presenter, University of Oklahoma
Richard, R. - Presenter, University of Oklahoma
Diaz, A. - Presenter, University of Oklahoma


Recombinant DNA technology is important in the mass production of proteins for academic, medical, and industrial use, and the prediction of the solubility of proteins is a significant part of it. However, the protein solubility when overexpressed in a host organism is difficult to predict. Thus, a model capable of accurately estimating the likelihood of proteins to form insoluble inclusion bodies would be highly useful in many applications, indicating whether proteins necessitate chaperones to remain soluble under the conditions within the host organism. To this end, solubility data for proteins when overexpressed in Escherichia coli was compiled, and properties of the proteins likely affecting solubility were identified as parameters for building solubility prediction models.

Three models were constructed using discriminant analysis, logistic regression, and neural networks, respectively. Significant parameters were determined, and the efficiencies of solubility prediction for the three procedures were compared.

We wil show ad-hoc and a-priori accuracies for the three models. We will also explore the use of them simultaneously.