False positives that arise when MS/MS data are used to search protein sequence databases remain a concern in proteomics research. Here, we present five types of false positives identified when aligning sequences to MS/MS spectra by Mascot database searching software. False positives arise because of (1) enzymatic digestion at abnormal sites; (2) misinterpretation of charge states; (3) misinterpretation of protein modifications; (4) incorrect assignment of the protein modification site; and (5) incorrect use of isotopic peaks. We present examples, clearly identified as false positives by manual inspection, that nevertheless were assigned high scores by Mascot sequence alignment algorithm. In some examples, the sequence assigned to the MS/MS spectrum explains more than 80% of the fragment ions present. Because of high sequence similarity between the false positives and their corresponding true hits, the false positive rate cannot be evaluated by the common method of using a reversed or scrambled sequence database. A common feature of the false positives is the presence of unmatched peaks in the MS/MS spectra. Our studies highlight the importance of using unmatched peaks to remove false positives and offer direction to aid development of better sequence alignment algorithms for peptide and PTM identification.
|Original language||English (US)|
|Number of pages||7|
|Journal||Journal of Proteome Research|
|State||Published - Jun 5 2009|
- Automated database search
- Manual verification
- Protein identification