(424c) PILOT_PROTEIN: A High-Throughput Method for In Silico Discovery of Peptides, Proteins, and Post-Translational Modifications
AIChE Annual Meeting
2011
2011 Annual Meeting
2011 Annual Meeting of the American Electrophoresis Society (AES)
Advances In Electrophoretic Protein Separation and Analysis
Wednesday, October 19, 2011 - 9:12am to 9:33am
A fundamental problem in proteomics is that of modified protein identification, which corresponds to determining sequence of a protein along with all amino acid modifications that were added to the protein after it was constructed in vivo (post-translational modifications or PTMs). We present a comprehensive set of tools that can address this problem using mixed-integer linear optimization (MILP) and tandem mass spectrometry. These tools have been integrated into a singular webtool that is freely available to the scientific community.
MS/MS spectra are initially analyzed using the de novo sequencing algorithm PILOT [1]. PILOT can rigorously guarantee a rank-ordered list of optimal candidate sequences without complete enumeration of all possible sequences. To utilize the strengths of a database routine, a hybrid de novo/database routine using local sequence alignment, PILOT_SEQUEL [2], was developed to input the results of the PILOT method and query these sequences against a database to find peptide matches.
A comprehensive unmodified protein list is constructed using the novel method PILOT_PROTEIN [3]. PILOT_PROTEIN will use the scored peptides of PILOT_SEQUEL to generate the protein list and incorporates a peptide clustering routine to reduce false positives. Using a known list of proteins, either an untargeted or a targeted PTM search can be performed to determine the set of PTM types and sites that best explains the experimental data [4]. If the proteins are known to be highly modified, a MILP approach may be used to determine the identification and quantification of all proteins that may exist within each MS/MS. The resulting output from the above routines is a complete modified protein list.
To verify the protein prediction capability of each method, several comparative studies were performed against state-of-the-art algorithms using data sets from a variety of MS/MS instruments and fragmentation types. Specifically, the peptide prediction accuracy of PILOT and PILOT_SEQUEL, the protein identification accuracy of PILOT_PROTEIN, and the PTM prediction accuracy of PILOT_PTM were analyzed using over 170 LC-MS/MS data sets from the Standard Protein Mix Database [5] comprising a total of 1.5 million MS/MS. Each of the methods produces superior predictive capability when compared to other algorithms and maintains these results across all of the test data sets.
[1] P. A. DiMaggio Jr. and C. A. Floudas. De novo peptide identification via tandem mass spectrometry and integer linear. Anal. Chem., 79:1433-1446, 2007.
[2] P. A. DiMaggio Jr., B. Lu, J. R. Yates III, and C. A. Floudas. A Hybrid Method for Peptide Identification Using Integer Linear Optimization, Local Database Search, and Quadrupole Time-of-Flight or OrbiTrap Tandem Mass Spectrometry. J. Proteome Res., 7(4):1584–1593, 2008.
[3] R. C. Baliban, P. A. DiMaggio Jr., ZuKui Li., M. D. Plazas-Mayorca, B. A Garcia, and C. A. Floudas. Identification of modified and unmodified proteins via high-resolution tandem mass spectrometry and mixed-integer linear optimization. Mol. Cell Proteomics, submitted.
[4] R. C. Baliban, P. A. DiMaggio Jr., M. D. Plazas-Mayorca, N. L. Young, B. A Garcia, and C. A. Floudas. A Novel Method for Untargeted Post-Translational Modification Identification Using Integer Linear Optimization and Tandem Mass Spectrometry. Mol. Cell Proteomics, 9:764-779, 2010.
[5] J. Klimer, J. S. Eddes, L. Hohmann, J. Jackson, A. Peterson, S. Letarte, P R. Gafken, J. E. Katz, P. Mallick, H. Lee, A. Schmidt, R. Ossola, J. K. Eng, R. Aebersold, and D. B. Martin. The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software Tools. J. Proteome Res., 7(1):96–103, 2008.