(485am) Sequential and Simultaneous Secondary Structure Prediction for Globular Proteins
AIChE Annual Meeting
2009
2009 Annual Meeting
Food, Pharmaceutical & Bioengineering Division
Poster Session: Bioengineering
Wednesday, November 11, 2009 - 6:00pm to 8:00pm
Secondary structure prediction forms an important precursor step in the tertiary structure prediction of proteins, particularly for first principles based structure prediction algorithms like ASTRO-FOLD[1]. Most common methods for secondary structure prediction involve the use of profile information of the target sequence, derived out of sequence alignment techniques like BLAST[2].
We present a sequential and simultaneous approach for the prediction of α ? helices and β ? strands in proteins. The α ? helical prediction model is denoted as HELIOS (HELical prediction using Integer Optimization approacheS). Here, a two ? stage infeasibility minimization problem has been introduced. The first stage is a linear programming (LP) model for parameter estimation, while the second stage is an integer programming (ILP) model for helix prediction. The residues of a target protein are divided into 4 regions depending on their putative proximity to the helix termini, and propensity to be in helices is compared to a pre?evaluated residue?dependent threshold propensity, using overlapping nonapeptides surrounding the central residue. The β ? strand prediction model, denoted as BEST-PRED, maximizes a residue's propensity to be in a β ? strand. The protein is divided into overlapping pentapeptides. The β ? strand propensity weight for the central residue is evaluated by implementing a novel combination of Naïve ? Bayesian and first order Markov models, which represent the physical nature of a β ? strand.
A number of additional constraints are introduced in both the sequential and simultaneous models for secondary structure prediction. Two linear regression models, with parameters reflecting the physical properties of the amino acids of the target sequence, have been implemented for the evaluation of helical and strand content of the protein[3]. The values obtained from these models are applied as lower and upper bounds on the secondary structure content of the protein, by allowing for errors in evaluation. Important mathematical constraints are introduced to ensure that biologically meaningful results are presented. These constraints refer to the physical nature of the residues [4]. For the simultaneous model, individual amino acid propensities to be in short loops and β ? turns has been incorporated, by attaching penalty to their assignment in any secondary structure. In the sequential model, the α-helical prediction is followed by the prediction of β-turns. This is followed by the prediction of the β-strand regions of the protein. A significant advantage of these approaches towards secondary structure prediction is the implementation of integer cut constraints. These constraints allow the model to output a rank ordered list of predictions, along with the global minimum prediction. Hence, it becomes possible to list out a small subset of possible solutions, and provide a consensus solution as the final result.
The method has been tested on a large dataset of α, β and mixed α/β proteins, and the initial results are very promising.
Bibliography
[1] JL Klepeis and CA Floudas (2003) ASTRO ? FOLD: a combinatorial and global optimization framework for ab initio prediction of three ? dimensional structures of proteins from the amino acid sequence, Biophysical Journal, 85, 2119 ? 2146.
[2] SF Altschul, W Gish, W Miller, EW Myers and DJ Lipman, (1997) Gapped BLAST and PSI ? BLAST: a new generation of protein database search programs, Nucleic Acids Research, 25, 3389 ? 3402.
[3] L Homaeian, LA Kurgan, J Ruan, KJ Cios and K Chen (2007) Prediction of protein secondary structure content for twilight zone sequences, Proteins, 69, 486 ? 498
[4] R Aurora and GD Rose (1998) Helix Capping, Protein Science, 7, 21 ? 38.