(342f) A De Novo Approach for Untargeted Post-Translational Modification Prediction Using Tandem Mass Spectrometry and Integer Linear Optimization | AIChE

(342f) A De Novo Approach for Untargeted Post-Translational Modification Prediction Using Tandem Mass Spectrometry and Integer Linear Optimization

Authors 

Baliban, R. - Presenter, Princeton University
DiMaggio, P. A. Jr. - Presenter, Princeton University
Young, N. - Presenter, Princeton University
Garcia, B. A. - Presenter, Princeton University
Floudas, C. A. - Presenter, Princeton University


Classification of the post-translational modifications (PTMs) of various organisms is currently a major challenge in the field of proteomics. Tandem mass spectrometry (MS/MS) has shown to be an excellent tool for peptide sequence prediction, and is indispensable in determining PTMs [1,2]. Consequentially, many research groups [3-9] have incorporated modification discovery into their respective identification algorithms which utilize multiple databases to build a list of variable modifications that can exist on a candidate peptide. To circumvent the large complexity associated with variable modification identification, an upper bound is often assigned for the total number of (a) variable modification types or (b) variable modification sites. However, thorough determination of a dynamic proteome requires an unbiased analysis of any and all modifications that can occur. Unrestrictive algorithms have been developed [3,4] which focus on the spectral alignment to create integer mass modifications that exist on a candidate peptide. However, these algorithms place a restriction on the number of modification sites to enhance computational efficiency and reduce the false detection of low-mass modifications.

We have developed PILOT_PTM, which is a novel de novo method for untargeted PTM Prediction via Integer Linear Optimization (ILP) and Tandem mass spectrometry. ILP has been an integral tool in the highly accurate de novo sequencing algorithm PILOT [10,11] and the hybrid algorithm PILOT_SEQUEL [12]. Similar to these methods, our objective function seeks to maximize the sum of the intensity contributions from theoretical peak matches to the experimental spectrum given logical constraints. Given a template amino acid sequence, the method can predict the modifications at all positions on the peptide sequence. That is, our model will assume that all template positions can be post-translationally modified and will seek to determine the optimal set of modifications among a universal list based on the MS/MS data. PILOT_PTM is not limited to choosing modification sites for a peptide and will not infer a sequence tag from the MS/MS. The method rigorously guarantees the optimal set of modifications without having to enumerate all combinations of possible modifications.

To verify the capability of the method across distinct data types, PILOT_PTM was tested on several data sets including (a) unmodified peptides fragmented via Collision Induced Dissociation (CID) on (i) ion trap, (ii) Q-TOF, and (iii) Orbitrap mass spectrometers, (b) chemically synthesized phosphopeptides fragmented via Electron Transfer Dissociation (ETD), (c) the 1-50 N-terminal tail of the histone H3 protein fragmented via Electron Collision Dissociation (ECD), and (d) propionylated H3 peptides fragmented via CID. The results of each data set were quantified using several metrics including the overall residue prediction accuracy, the complete peptide prediction accuracy, and the subsequence accuracy. PILOT_PTM maintains a high accuracy for all scoring metrics across all data sets, emphasizing the capability as an instrument independent algorithm capable of handling multiple fragmentation methods.

To benchmark the capability of the method, the results of the (d) propionylated H3 data set were compared with those acquired from five state-of-the-art algorithms including three hybrid sequence tag/database approaches (InsPecT [5], Modi [6], and VEMS [7]) and two pure database approaches (Mascot [8] and X!Tandem [9]). To develop a method for comparison, a protocol was established for Mascot, InsPecT, VEMS, and X!Tandem. Since these algorithms all severely restrict the number of variable modification types, their performance will directly depend on the choice of modifications used to scan the test data. We ran many test trials with distinct sets of variable modifications to obtain averaged results for each algorithm. The Modi algorithm places no restriction on the number of variable modifications, but does require the database to be trimmed to no more than twenty proteins [6]. Thus, only one trial using the universal modification list and the twenty most abundant proteins was completed for Modi. PILOT_PTM demonstrates superior prediction accuracy for all scoring metrics and consistently outperforms the top hybrid and database algorithms.

[1] A. I. Nesvizhskii, O. Vitek, and R. Aebersold. Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat. Methods, 4(11):787?797, 2007.

[2] E. S. Witze, W. M. Old, K. A. Resing, and N. G. Ahn. Mapping protein post-translational modifications with mass spectrometry. Nat. Methods, 4(11):798?806, 2007.

[3] D. Tsur, S. Tanner, E. Zandi, V. Bafna, and P. A. Pevzner. Identification of post-translational modifications by blind search of mass spectra. Nat. Biotechnol., 23(12):1562?1567, 2005.

[4] C. Baumgartner, T. Rejtar, M. Kullolli, L. M. Akella, and B. L. Karger. SeMoP: A New Computational Strategy for the Unrestricted Search for Modified Peptides Using LCMS/MS Data. J. Proteome Res., 7(9):4199?4208, 2008.

[5] S. Tanner, H. Shu, A. Frank, L. C.Wang, E. Zandi, M. Mumby, P. A. Pevzner, and V. Bafna. InsPecT: Identification of Posttranslationally Modified Peptides from Tandem Mass Spectrometry. Anal. Chem., 77(14):4626?4639, 2005.

[6] S. Kim, S. Na, J. W. Sim, H. Park, J. Jeong, H. Kim, Y. Seo, J. Seo, K. J. Lee, and E. Paek. Modi: a powerful and convenient web server for identifying multiple post-translational peptide modifications from tandem mass spectra. Nucleic Acids Res., 34:258?263, 2006.

[7] R. Matthiesen, M. Lundsgaard, K. G. Welinder, and G. Bauw. Interpreting peptide mass spectra by VEMS. Bioinformatics, 19(6):792?793, 2003.

[8] D. M. Creasy and J. S. Cottrell. Error tolerant searching of uninterpreted tandem mass spectrometry data. Proteomics, 2:1426?1431, 2002.

[9] R. Craig and R. C. Beavis. TANDEM: matching proteins with tandem mass spectra. Bioinformatics, 20(9):1466?1467, 2004.

[10] P. A. DiMaggio Jr. and C. A. Floudas. De Novo Peptide Identification via Tandem Mass Spectrometry and Mixed-Integer Optimization. AIChE Jour., 53(1):160?173, 2007.

[11] P. A. DiMaggio Jr. and C. A. Floudas. De novo peptide identification via tandem mass spectrometry and integer linear optimization. Anal. Chem., 79(4):1433?1446, 2007.

[12] P. A. DiMaggio Jr., C. A. Floudas, B. Lu, and J. R. Yates III. A Hybrid Method for Peptide Identification Using Integer Linear Optimization, Local Database Search, and Quadrupole Time-of-Flight or OrbiTrap Tandem Mass Spectrometry. J. Proteome Res., 7:1584?1593, 2008.