Challenges in Peptide-Spectrum Matching: A Robust and Reproducible Statistical Framework for Removing Low-Accuracy, High-Scoring Hits

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Workflows for large-scale (MS)-based shotgun proteomics can potentially lead to costly errors in the form of incorrect peptide-spectrum matches (PSMs). To improve the robustness of these workflows, we have investigated the use of the precursor mass discrepancy (PMD) to detect and filter potentially false PSMs that have, nonetheless, a high confidence score. We identified and addressed three cases of unexpected bias in PMD results: time of acquisition within a liquid chromatography-mass spectrometry (LC-MS) run, decoy PSMs, and length of the peptide. We created a postanalysis Bayesian confidence measure based on score and PMD, called PMD-false discovery rate (FDR). We tested PMD-FDR on four data sets across three types of MS-based proteomics projects: standard (single organism; reference database), proteogenomics (single organism; customized genomic-based database plus reference), and metaproteomics (microorganism community; customized conglomerate database). On a ground-truth data set and other representative data, PMD-FDR was able to detect 60-80% of likely incorrect PSMs (false-hits) while losing only 5% of correct PSMs (true-hits). PMD-FDR can also be used to evaluate data quality for results generated within different experimental PSM-generating workflows, assisting in method development. Going forward, PMD-FDR should provide detection of high scoring but likely false-hits, aiding applications that rely heavily on accurate PSMs, such as proteogenomics and metaproteomics.

Original languageEnglish (US)
Pages (from-to)161-173
Number of pages13
JournalJournal of Proteome Research
Volume19
Issue number1
DOIs
StatePublished - Jan 3 2020

Bibliographical note

Funding Information:
We extend our gratitude to Marc Vaudel and Harald Barsnes for discussions on PeptideShaker confidence scoring. We thank Colleen Hayes for assistance with figure design. This work was funded in part by NSF award 1458524 and NIH award U24CA199347 to T.J. Griffin and the Galaxy for proteomics (Galaxy-P) research team.

Publisher Copyright:
© 2019 American Chemical Society.

Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

Keywords

  • false discovery rate
  • metaproteomics
  • peptide-spectrum match
  • proteogenomics
  • statistical analysis
  • tandem mass spectrometry, precursor mass discrepancy

PubMed: MeSH publication types

  • Journal Article
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

Fingerprint Dive into the research topics of 'Challenges in Peptide-Spectrum Matching: A Robust and Reproducible Statistical Framework for Removing Low-Accuracy, High-Scoring Hits'. Together they form a unique fingerprint.

Cite this