(424c) PILOT_PROTEIN: A High-Throughput Method for In Silico Discovery of Peptides, Proteins, and Post-Translational Modifications | AIChE

(424c) PILOT_PROTEIN: A High-Throughput Method for In Silico Discovery of Peptides, Proteins, and Post-Translational Modifications

Authors 

Baliban, R. - Presenter, Princeton University
DiMaggio, P. A. Jr. - Presenter, Princeton University
Li, Z. - Presenter, Princeton University
Plazas-Mayorca, M. - Presenter, Princeton Universirty
Garcia, B. A. - Presenter, Princeton University
Floudas, C. A. - Presenter, Princeton University


A fundamental problem in proteomics is that of modified protein identification, which corresponds to determining sequence of a protein along with all amino acid modifications that were added to the protein after it was constructed in vivo (post-translational modifications or PTMs).  We present a comprehensive set of tools that can address this problem using mixed-integer linear optimization (MILP) and tandem mass spectrometry.  These tools have been integrated into a singular webtool that is freely available to the scientific community.

MS/MS spectra are initially analyzed using the de novo sequencing algorithm PILOT [1].  PILOT can rigorously guarantee a rank-ordered list of optimal candidate sequences without complete enumeration of all possible sequences.  To utilize the strengths of a database routine, a hybrid de novo/database routine using local sequence alignment, PILOT_SEQUEL [2], was developed to input the results of the PILOT method and query these sequences against a database to find peptide matches.

A comprehensive unmodified protein list is constructed using the novel method PILOT_PROTEIN [3].  PILOT_PROTEIN will use the scored peptides of PILOT_SEQUEL to generate the protein list and incorporates a peptide clustering routine to reduce false positives.  Using a known list of proteins, either an untargeted or a targeted PTM search can be performed to determine the set of PTM types and sites that best explains the experimental data [4].  If the proteins are known to be highly modified, a MILP approach may be used to determine the identification and quantification of all proteins that may exist within each MS/MS.  The resulting output from the above routines is a complete modified protein list.

To verify the protein prediction capability of each method, several comparative studies were performed against state-of-the-art algorithms using data sets from a variety of MS/MS instruments and fragmentation types.  Specifically, the peptide prediction accuracy of PILOT and PILOT_SEQUEL, the protein identification accuracy of PILOT_PROTEIN, and the PTM prediction accuracy of PILOT_PTM were analyzed using over 170 LC-MS/MS data sets from the Standard Protein Mix Database [5] comprising a total of 1.5 million MS/MS.  Each of the methods produces superior predictive capability when compared to other algorithms and maintains these results across all of the test data sets.

[1] P. A. DiMaggio Jr. and C. A. Floudas.  De novo peptide identification via tandem mass spectrometry and integer linear.  Anal. Chem., 79:1433-1446, 2007.

[2] P. A. DiMaggio Jr., B. Lu, J. R. Yates III, and C. A. Floudas.  A Hybrid Method for Peptide Identification Using Integer Linear Optimization, Local Database Search, and Quadrupole Time-of-Flight or OrbiTrap Tandem Mass Spectrometry.  J. Proteome Res., 7(4):1584–1593, 2008.

[3] R. C. Baliban, P. A. DiMaggio Jr., ZuKui Li., M. D. Plazas-Mayorca, B. A Garcia, and C. A. Floudas.  Identification of modified and unmodified proteins via high-resolution tandem mass spectrometry and mixed-integer linear optimization.  Mol. Cell Proteomics, submitted.

[4] R. C. Baliban, P. A. DiMaggio Jr., M. D. Plazas-Mayorca, N. L. Young, B. A Garcia, and C. A. Floudas.  A Novel Method for Untargeted Post-Translational Modification Identification Using Integer Linear Optimization and Tandem Mass Spectrometry.  Mol. Cell Proteomics, 9:764-779, 2010.

[5] J. Klimer, J. S. Eddes, L. Hohmann, J. Jackson, A. Peterson, S. Letarte, P R. Gafken, J. E. Katz, P. Mallick, H. Lee, A. Schmidt, R. Ossola, J. K. Eng, R. Aebersold, and D. B. Martin.  The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software Tools. J. Proteome Res., 7(1):96–103, 2008.