(214e) Prediction of Protein Solubility in Escherichia Coli Using Discriminant Analysis, Logistic Regression, and Artificial Neural Network Models
AIChE Annual Meeting
2007
2007 Annual Meeting
Food, Pharmaceutical & Bioengineering Division
Advances in Protein Structure, Function and Stability I
Tuesday, November 6, 2007 - 2:00pm to 2:20pm
Recombinant DNA technology is important in the mass production of proteins for academic, medical, and industrial use, and the prediction of the solubility of proteins is a significant part of it. However, the protein solubility when overexpressed in a host organism is difficult to predict. Thus, a model capable of accurately estimating the likelihood of proteins to form insoluble inclusion bodies would be highly useful in many applications, indicating whether proteins necessitate chaperones to remain soluble under the conditions within the host organism. To this end, solubility data for proteins when overexpressed in Escherichia coli was compiled, and properties of the proteins likely affecting solubility were identified as parameters for building solubility prediction models.
Three models were constructed using discriminant analysis, logistic regression, and neural networks, respectively. Significant parameters were determined, and the efficiencies of solubility prediction for the three procedures were compared.
We wil show ad-hoc and a-priori accuracies for the three models. We will also explore the use of them simultaneously.