Metaproteomic analysis using the Galaxy framework

Pratik D Jagtap, Alan Blakely, Kevin Murray, Shaun Stewart, Joel Kooren, James E Johnson, Nelson L Rhodus, Joel D Rudney, Timothy J Griffin

Research output: Contribution to journalArticle

31 Citations (Scopus)

Abstract

Metaproteomics characterizes proteins expressed by microorganism communities (microbiome) present in environmental samples or a host organism (e.g. human), revealing insights into the molecular functions conferred by these communities. Compared to conventional proteomics, metaproteomics presents unique data analysis challenges, including the use of large protein databases derived from hundreds or thousands of organisms, as well as numerous processing steps to ensure high data quality. These challenges limit the use of metaproteomics for many researchers. In response, we have developed an accessible and flexible metaproteomics workflow within the Galaxy bioinformatics framework. Via analysis of human oral tissue exudate samples, we have established a modular Galaxy-based workflow that automates a reduction method for searching large sequence databases, enabling comprehensive identification of host proteins (human) as well as "meta-proteins" from the nonhost organisms. Downstream, automated processing steps enable basic local alignment search tool analysis and evaluation/visualization of peptide sequence match quality, maximizing confidence in results. Outputted results are compatible with tools for taxonomic and functional characterization (e.g. Unipept, MEGAN5). Galaxy also allows for the sharing of complete workflows with others, promoting reproducibility and also providing a template for further modification and enhancement. Our results provide a blueprint for establishing Galaxy as a solution for metaproteomic data analysis. All MS data have been deposited in the ProteomeXchange with identifier PXD001655 (http://proteomecentral.proteomexchange.org/dataset/PXD001655).

Original languageEnglish (US)
Pages (from-to)3553-3565
Number of pages13
JournalProteomics
Volume15
Issue number20
DOIs
StatePublished - Oct 1 2015

Fingerprint

Galaxies
Workflow
Proteins
Blueprints
Protein Databases
Microbiota
Exudates and Transudates
Bioinformatics
Processing
Computational Biology
Microorganisms
Proteomics
Visualization
Research Personnel
Databases
Tissue
Peptides

Keywords

  • Bioinformatics
  • Customized database generation
  • Mass spectrometry
  • Metaproteomics
  • Peptide sequence match
  • Sequence database search

Cite this

Metaproteomic analysis using the Galaxy framework. / Jagtap, Pratik D; Blakely, Alan; Murray, Kevin; Stewart, Shaun; Kooren, Joel; Johnson, James E; Rhodus, Nelson L; Rudney, Joel D; Griffin, Timothy J.

In: Proteomics, Vol. 15, No. 20, 01.10.2015, p. 3553-3565.

Research output: Contribution to journalArticle

Jagtap PD, Blakely A, Murray K, Stewart S, Kooren J, Johnson JE et al. Metaproteomic analysis using the Galaxy framework. Proteomics. 2015 Oct 1;15(20):3553-3565. https://doi.org/10.1002/pmic.201500074
Jagtap, Pratik D ; Blakely, Alan ; Murray, Kevin ; Stewart, Shaun ; Kooren, Joel ; Johnson, James E ; Rhodus, Nelson L ; Rudney, Joel D ; Griffin, Timothy J. / Metaproteomic analysis using the Galaxy framework. In: Proteomics. 2015 ; Vol. 15, No. 20. pp. 3553-3565.
@article{fd166d0cc9504c6ab5d81e9ce91af9fc,
title = "Metaproteomic analysis using the Galaxy framework",
abstract = "Metaproteomics characterizes proteins expressed by microorganism communities (microbiome) present in environmental samples or a host organism (e.g. human), revealing insights into the molecular functions conferred by these communities. Compared to conventional proteomics, metaproteomics presents unique data analysis challenges, including the use of large protein databases derived from hundreds or thousands of organisms, as well as numerous processing steps to ensure high data quality. These challenges limit the use of metaproteomics for many researchers. In response, we have developed an accessible and flexible metaproteomics workflow within the Galaxy bioinformatics framework. Via analysis of human oral tissue exudate samples, we have established a modular Galaxy-based workflow that automates a reduction method for searching large sequence databases, enabling comprehensive identification of host proteins (human) as well as {"}meta-proteins{"} from the nonhost organisms. Downstream, automated processing steps enable basic local alignment search tool analysis and evaluation/visualization of peptide sequence match quality, maximizing confidence in results. Outputted results are compatible with tools for taxonomic and functional characterization (e.g. Unipept, MEGAN5). Galaxy also allows for the sharing of complete workflows with others, promoting reproducibility and also providing a template for further modification and enhancement. Our results provide a blueprint for establishing Galaxy as a solution for metaproteomic data analysis. All MS data have been deposited in the ProteomeXchange with identifier PXD001655 (http://proteomecentral.proteomexchange.org/dataset/PXD001655).",
keywords = "Bioinformatics, Customized database generation, Mass spectrometry, Metaproteomics, Peptide sequence match, Sequence database search",
author = "Jagtap, {Pratik D} and Alan Blakely and Kevin Murray and Shaun Stewart and Joel Kooren and Johnson, {James E} and Rhodus, {Nelson L} and Rudney, {Joel D} and Griffin, {Timothy J}",
year = "2015",
month = "10",
day = "1",
doi = "10.1002/pmic.201500074",
language = "English (US)",
volume = "15",
pages = "3553--3565",
journal = "Proteomics",
issn = "1615-9853",
publisher = "Wiley-VCH Verlag",
number = "20",

}

TY - JOUR

T1 - Metaproteomic analysis using the Galaxy framework

AU - Jagtap, Pratik D

AU - Blakely, Alan

AU - Murray, Kevin

AU - Stewart, Shaun

AU - Kooren, Joel

AU - Johnson, James E

AU - Rhodus, Nelson L

AU - Rudney, Joel D

AU - Griffin, Timothy J

PY - 2015/10/1

Y1 - 2015/10/1

N2 - Metaproteomics characterizes proteins expressed by microorganism communities (microbiome) present in environmental samples or a host organism (e.g. human), revealing insights into the molecular functions conferred by these communities. Compared to conventional proteomics, metaproteomics presents unique data analysis challenges, including the use of large protein databases derived from hundreds or thousands of organisms, as well as numerous processing steps to ensure high data quality. These challenges limit the use of metaproteomics for many researchers. In response, we have developed an accessible and flexible metaproteomics workflow within the Galaxy bioinformatics framework. Via analysis of human oral tissue exudate samples, we have established a modular Galaxy-based workflow that automates a reduction method for searching large sequence databases, enabling comprehensive identification of host proteins (human) as well as "meta-proteins" from the nonhost organisms. Downstream, automated processing steps enable basic local alignment search tool analysis and evaluation/visualization of peptide sequence match quality, maximizing confidence in results. Outputted results are compatible with tools for taxonomic and functional characterization (e.g. Unipept, MEGAN5). Galaxy also allows for the sharing of complete workflows with others, promoting reproducibility and also providing a template for further modification and enhancement. Our results provide a blueprint for establishing Galaxy as a solution for metaproteomic data analysis. All MS data have been deposited in the ProteomeXchange with identifier PXD001655 (http://proteomecentral.proteomexchange.org/dataset/PXD001655).

AB - Metaproteomics characterizes proteins expressed by microorganism communities (microbiome) present in environmental samples or a host organism (e.g. human), revealing insights into the molecular functions conferred by these communities. Compared to conventional proteomics, metaproteomics presents unique data analysis challenges, including the use of large protein databases derived from hundreds or thousands of organisms, as well as numerous processing steps to ensure high data quality. These challenges limit the use of metaproteomics for many researchers. In response, we have developed an accessible and flexible metaproteomics workflow within the Galaxy bioinformatics framework. Via analysis of human oral tissue exudate samples, we have established a modular Galaxy-based workflow that automates a reduction method for searching large sequence databases, enabling comprehensive identification of host proteins (human) as well as "meta-proteins" from the nonhost organisms. Downstream, automated processing steps enable basic local alignment search tool analysis and evaluation/visualization of peptide sequence match quality, maximizing confidence in results. Outputted results are compatible with tools for taxonomic and functional characterization (e.g. Unipept, MEGAN5). Galaxy also allows for the sharing of complete workflows with others, promoting reproducibility and also providing a template for further modification and enhancement. Our results provide a blueprint for establishing Galaxy as a solution for metaproteomic data analysis. All MS data have been deposited in the ProteomeXchange with identifier PXD001655 (http://proteomecentral.proteomexchange.org/dataset/PXD001655).

KW - Bioinformatics

KW - Customized database generation

KW - Mass spectrometry

KW - Metaproteomics

KW - Peptide sequence match

KW - Sequence database search

UR - http://www.scopus.com/inward/record.url?scp=84944158856&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84944158856&partnerID=8YFLogxK

U2 - 10.1002/pmic.201500074

DO - 10.1002/pmic.201500074

M3 - Article

VL - 15

SP - 3553

EP - 3565

JO - Proteomics

JF - Proteomics

SN - 1615-9853

IS - 20

ER -