(306d) An Improved De Novo Peptide Sequencing Framework Using Decomposition and Integer Linear Optimization | AIChE

(306d) An Improved De Novo Peptide Sequencing Framework Using Decomposition and Integer Linear Optimization

Authors 

Li, Z. - Presenter, Princeton University
Baliban, R. C. - Presenter, Princeton University
Floudas, C. A. - Presenter, Princeton University


Peptide and protein identification is of fundamental importance in the study of proteomics. Specifically, peptide sequencing and identification forms a basis for peptide identification. Of the two most frequent computational approaches for peptide sequencing, de novo methods have distinct advantages over database methods in that they can analyze peptides not present in a protein database and are more amenable to identifying post-translational modifications.

Tandem mass spectrometry (MS/MS) coupled with high performance liquid chromatography (HPLC) has emerged as a powerful protocol for high-throughput and high sensitivity peptide and protein identification experiments. They have served as an impetus for the recent development of numerous computational approaches formulated to sequence peptides robustly and efficiently with particular emphasis on the integration of sequencing algorithms into a high throughput computational framework for proteomics.

PILOT is an integer linear optimization based de novo peptide sequencing framework developed in our group [1,2]. The performance of PILOT has been shown to be good on high quality spectrums. However, its accuracy is still restricted for low quality spectrums characterized by the following factors: large amount of noise peak, doubly charged ions in low precision spectrum (e.g., ion trap), because of the selection of false fragment ions peaks.

An improved de novo peptide sequencing framework has been developed in this work. The algorithm is based on a spectrum decomposition strategy and the integer linear optimization technique, and it is composed by four stages. In the first stage of the algorithm, the whole spectrum is considered and different sets of high confidence fragment peak solutions are identified by sequencing b-ion and y-ion series simultaneously using integer linear optimization. In the second stage, a MS/MS spectrum is decomposed into left hand side (LHS) and right-hand-side (RHS) by the m/z value (mass charge ratio) of the doubly charged precursor ions (yN++). The LHS and RHS spectra are sequenced separately by considering the high confidence solution obtained from the first stage (through fixing the usage of or setting high intensity of the tag peaks). Up to twenty sets of LHS solutions and up to twenty sets of RHS solutions are obtained for the LHS and RHS of the spectrum, respectively. Using integer cuts, all the LHS solutions and the RHS solutions are distinct. In the third stage, the full sequence is generated by combining the LHS and RHS sequence with a set of predefined combination rules, which include the complementarity restriction, LHS and RHS connection through single or double amino acid mass, and limits on number of mass gaps. In the final stage of the algorithm, the candidate sequences are ranked by cross-correlating the theoretical spectra of the candidate sequences with the experimental tandem mass spectrum and finally a predefined number of top ranked sequences are reported.

The proposed de novo sequencing framework is tested on various data sets generated from both low precision Ion Trap and high precision Orbi Trap mass spectrometers. Furthermore, the method has also been tested by combining the de novo and database techniques in a hybrid framework [3] which utilizes the de novo sequences to generate large “sequence tags” which are further used to query a protein database. Computational results on the de novo peptide identification and hybrid method based identification, as well as comparisons with state of the art methods will be presented.

[1] P.A. DiMaggio and C.A. Floudas. A mixed-integer optimization framework for de novo peptide identification. AIChE Journal, 53(1), 160-173 (2007).

[2] P.A. DiMaggio and C.A. Floudas. De novo peptide identification via tandem mass spectrometry and integer linear optimization. Anal. Chem., 79, 1433-1446 (2007).

[3] P.A. DiMaggio, C.A. Floudas, B. Lu and J.R. Yates. A Hybrid Method for Peptide Identification using Integer Linear Optimization, Local Database Search, and QTOF or OrbiTrap Tandem Mass Spectrometry. J. Proteome Res., 7, 1584-1593 (2008).