A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies

Pratik Jagtap, Jill Goslinga, Joel A. Kooren, Thomas Mcgowan, Matthew S. Wroblewski, Sean L. Seymour, Timothy J. Griffin

Research output: Contribution to journalArticle

93 Citations (Scopus)

Abstract

Large databases (>106 sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database-search programs. Most notably, strict filtering to avoid false-positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two-step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target-decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two-step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one-step method. The two-step method captured almost all of the same peptides matched by the one-step method, with a majority of the additional matches being false negatives from the one-step method. Furthermore, the two-step method improved results regardless of the database search program used. Our results show that our two-step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies.

Original languageEnglish (US)
Pages (from-to)1352-1357
Number of pages6
JournalProteomics
Volume13
Issue number8
DOIs
StatePublished - Apr 1 2013

Fingerprint

Databases
Peptides
Proteogenomics
Proteomics
Proteins

Keywords

  • Bioinformatics
  • Mass spectrometry
  • Metaproteomics
  • Proteogenomics
  • Sequence database search
  • Two-step workflow

Cite this

A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies. / Jagtap, Pratik; Goslinga, Jill; Kooren, Joel A.; Mcgowan, Thomas; Wroblewski, Matthew S.; Seymour, Sean L.; Griffin, Timothy J.

In: Proteomics, Vol. 13, No. 8, 01.04.2013, p. 1352-1357.

Research output: Contribution to journalArticle

Jagtap, Pratik ; Goslinga, Jill ; Kooren, Joel A. ; Mcgowan, Thomas ; Wroblewski, Matthew S. ; Seymour, Sean L. ; Griffin, Timothy J. / A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies. In: Proteomics. 2013 ; Vol. 13, No. 8. pp. 1352-1357.
@article{6dcf0b876495405c86773db4b3b90dd0,
title = "A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies",
abstract = "Large databases (>106 sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database-search programs. Most notably, strict filtering to avoid false-positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two-step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target-decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two-step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one-step method. The two-step method captured almost all of the same peptides matched by the one-step method, with a majority of the additional matches being false negatives from the one-step method. Furthermore, the two-step method improved results regardless of the database search program used. Our results show that our two-step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies.",
keywords = "Bioinformatics, Mass spectrometry, Metaproteomics, Proteogenomics, Sequence database search, Two-step workflow",
author = "Pratik Jagtap and Jill Goslinga and Kooren, {Joel A.} and Thomas Mcgowan and Wroblewski, {Matthew S.} and Seymour, {Sean L.} and Griffin, {Timothy J.}",
year = "2013",
month = "4",
day = "1",
doi = "10.1002/pmic.201200352",
language = "English (US)",
volume = "13",
pages = "1352--1357",
journal = "Proteomics",
issn = "1615-9853",
publisher = "Wiley-VCH Verlag",
number = "8",

}

TY - JOUR

T1 - A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies

AU - Jagtap, Pratik

AU - Goslinga, Jill

AU - Kooren, Joel A.

AU - Mcgowan, Thomas

AU - Wroblewski, Matthew S.

AU - Seymour, Sean L.

AU - Griffin, Timothy J.

PY - 2013/4/1

Y1 - 2013/4/1

N2 - Large databases (>106 sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database-search programs. Most notably, strict filtering to avoid false-positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two-step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target-decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two-step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one-step method. The two-step method captured almost all of the same peptides matched by the one-step method, with a majority of the additional matches being false negatives from the one-step method. Furthermore, the two-step method improved results regardless of the database search program used. Our results show that our two-step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies.

AB - Large databases (>106 sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database-search programs. Most notably, strict filtering to avoid false-positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two-step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target-decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two-step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one-step method. The two-step method captured almost all of the same peptides matched by the one-step method, with a majority of the additional matches being false negatives from the one-step method. Furthermore, the two-step method improved results regardless of the database search program used. Our results show that our two-step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies.

KW - Bioinformatics

KW - Mass spectrometry

KW - Metaproteomics

KW - Proteogenomics

KW - Sequence database search

KW - Two-step workflow

UR - http://www.scopus.com/inward/record.url?scp=84876329019&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876329019&partnerID=8YFLogxK

U2 - 10.1002/pmic.201200352

DO - 10.1002/pmic.201200352

M3 - Article

C2 - 23412978

AN - SCOPUS:84876329019

VL - 13

SP - 1352

EP - 1357

JO - Proteomics

JF - Proteomics

SN - 1615-9853

IS - 8

ER -