A new algorithm ProVerB based on a novel binominal distribution statistical model

Date:2015-12-04 Views:568

Soft ionization techniques, e.g. matrix-assisted laser desorption ionization (MALDI) and electrospray ionization (ESI) are able to maintain the integrity of peptides, thus empowering the mass spectrometry (MS) methods to perform proteomic analysis. Protein identification is the most fundamental algorithm in the data processing pipeline, since the sensitivity and accuracy of the identification algorithm is crucial for downstream analyses. Generally, a peptide identification algorithm selects some peaks from the spectra, evaluates the similarity between the experimental and theoretical spectra, and then assigns the best match within the peptide error window as the result. The scoring models that evaluate the similarity between experimental and theoretical spectra should consider three aspects: the number of peak matches, the number of peak consecutive matches, and the intensities of matched peaks.

A number of peptide identification algorithms with various concepts for MS data are available, e.g. Mascot, Sequest, OMSSA, X!Tandem, MassWiz, Andromeda, and SQID. Mascot and Sequest are widely used commercial software and commonly adapted search tools in protein identification; however, only limited details of these algorithms are released. Mascot is based on a probability model, whereas Sequest is based on an empirical scoring model that computes cross-correlation between experimental and theoretical spectra. Mascot selects the highest peak in each 14 Da mass interval and keeps the peaks with their intensities above the threshold. Sequest takes consecutive matches of ions and intensity information into account and then preprocesses the spectrum by keeping the top 200 peaks and separates the spectrum into ten bins for normalization. X!Tandem uses a hypergeometric scoring model, while OMSSA is based on a Poisson scoring model to assess the significance of peptide match. They select the 50 most intensive peaks by default. MassWiz divides the spectrum dynamically and takes a maximum of 5 most intense peaks from each bin. SQID keeps the top 80 peaks after deleting parent related peaks.

However, none of these algorithms accurately uses the entire information in MS experiments. They share similar methods to generate theoretical spectra. Considering six types of ions (b-, y-, b-H2O, b-NH3, y-H2O, and y-NH3) in CID (collision-induced dissociation) fragmentation mode, theoretical peak intensities are then set as three artificial values: 50 (b- and y-ions), 25 (b- and y-ions without H2O or NH3), and 10 (a-ions) for a theoretical spectrum that does not fully reflect the intensity characteristics of experimental spectra. Therefore, these algorithms do not use the peak intensity information obtained in the experiment to make the comparison of the experimental and theoretical spectra once the peaks are selected. The incomplete use of MS information compromises the sensitivity, robustness, and confidence of most of these algorithms. A recent algorithm, SQID, is attempting to address this issue by introducing the strength probability of the pairwise amino acid fragments to consider the intensity match quality.

To make full use of the MS information and to maximize the universality, we present here a novel identification algorithm, the protein verification algorithm based on the binomial probability distribution (ProVerB), to enhance the accuracy, completeness, and robustness of the peptide identification. We tested ProVerB against other algorithms using multiple MS data sets, showing its higher ability and confidence to identify peptides from the mass spectrometry at 1% FDR, significantly and stably higher than those for the widely used Mascot and Sequest.

The boom of the proteomics applications and the wide variety of mass spectrometry technology on peptide identification necessitate a versatile and accurate peptide identification algorithm. In this paper, we present a new algorithm ProVerB based on a novel binominal distribution statistical model, and we validate its accuracy, robustness, and compatibility. ProVerB is an open source program so that no algorithmic detail is hidden as in the commercial software packages. Users may tune the parameters according to their specific experimental setup to optimize the results. Also, it can be compiled in various operating systems with a user-friendly graphical user interface. Although ProVerB does not support ECD/ETD mass spectrometry data, we believe that ProVerB will find broad application in proteomics studies and provide more robust and accurate results than the currently available commercial algorithms, producing a more solid base of data for downstream analyses.

[ Search ] [ ] [ Email ] [ Print ] [ Close ] [ Top ]

What's new?

Total:0comment(s) [View All] Related comment

Recomment

Popular

Tel:+86-10-68645975 Fax:+86-10-68645973
E-mail:yaoshang68@163.com QQ:1483838028

• SunSirs: On February 19th, the Price of Impo	• SunSirs: Costs Rose and Supply Increased, DO
• SunSirs: After the Holiday, Plasticizer Manu	• SunSirs: Costs Rose, Supply Increased, and D
• SunSirs: Poor Replenishment, the Price of Al	• SunSirs: On February 18th, the Domestic DBP
• SunSirs: The Price of Tetrachloroethylene Ha	• SunSirs: Demand Was Weak, Production Was Res
• SunSirs: Demand Was Weak, Production Was Res	• SunSirs: China Fuel Oil 180CST Market in 202
• SunSirs: The Demand Varies, China Gasoline h	• SunSirs: Just Entering February, the Dimethy
• SunSirs: The Market Price of Lithium Hydroxi	• SunSirs: The Price of Nitric Acid Had Droppe
• SunSirs : The Cement Market Prices in Hunan	• SunSirs: As the New Year Approaching, the Tr
• SunSirs: The Domestic First-class Titanium S	• SunSirs: The Market Price of Ferrous Lithium
• SunSirs: The Price of Sodium Metabisulfite W	• SunSirs: China Domestic Ship Fuel Market Flu
• SunSirs: The Lithium Hydroxide Market in Jan	• SunSirs: The NMP Market Weakly Declined in J
• SunSirs: Summary of Domestic Cobalt Market T	• SunSirs: On January 18th, Prices of Some Dom
• SunSirs: The Potassium Carbonate Market Has	• SunSirs: The Release of Demand Side Was Slow
• SunSirs: The Seamless Tube Market Had Been O	• SunSirs: Sales Were Sluggish, Propane Contin
• SunSirs: Production and Sales are Weak, Chin	• SunSirs: Weak consolidation of China MTBE ma