To gain a thorough appreciation of microbiome dynamics, researchers characterize the functional relevance of expressed microbial genes or proteins. This can be accomplished through metaproteomics, which characterizes the protein expression of microbiomes. Several software tools exist for analyzing microbiomes at the functional level by measuring their combined proteome-level response to environmental perturbations. In this survey, we explore the performance of six available tools, to enable researchers to make informed decisions regarding software choice based on their research goals. Tandem mass spectrometry-based proteomic data obtained from dental caries plaque samples grown with and without sucrose in paired biofilm reactors were used as representative data for this evaluation. Microbial peptides from one sample pair were identified by the X! tandem search algorithm via SearchGUI and subjected to functional analysis using software tools including eggNOG-mapper, MEGAN5, MetaGOmics, MetaProteomeAnalyzer (MPA), ProPHAnE, and Unipept to generate functional annotation through Gene Ontology (GO) terms. Among these software tools, notable differences in functional annotation were detected after comparing differentially expressed protein functional groups. Based on the generated GO terms of these tools we performed a peptide-level comparison to evaluate the quality of their functional annotations. A BLAST analysis against the NCBI non-redundant database revealed that the sensitivity and specificity of functional annotation varied between tools. For example, eggNOG-mapper mapped to the most number of GO terms, while Unipept generated more accurate GO terms. Based on our evaluation, metaproteomics researchers can choose the software according to their analytical needs and developers can use the resulting feedback to further optimize their algorithms. To make more of these tools accessible via scalable metaproteomics workflows, eggNOG-mapper and Unipept 4.0 were incorporated into the Galaxy platform.
Bibliographical noteFunding Information:
Funding: T.G. National Cancer Institute - Informatics Technology for Cancer Research (NCI-ITCR) grant 1U24CA199347 and National Science Foundation (U.S.) grant 1458524 P.D.J: Extreme Science and Engineering Discovery Environment (XSEDE) research allocation BIO170096 B.G: Collaborative Research Centre 992 Medical Epigenetics (DFG grant SFB 992/1 2012) and German Federal Ministry of Education and Research (BMBF grants 031 A538A/A538C RBC, 031L0101B/031L0101C de.NBI-epi, 031L0106 de. STAIR (de.NBI)). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We would like to thank the European Galaxy team for the help in the support during Galaxy implementation. We would also like to thank Alessandro Tanca (Porto Conte Ricerche, Italy), Mak Saito and Noelle Held (Woods Hole Oceanographic Institute, Woods Hole, MA) for discussion during the functional tools analysis. We would like to thank Tim van den Bossche (Ghent University, Belgium) for valuable inputs and thank Emma Leith (University of Minnesota) for proofreading the manuscript. We also acknowledge the support from the Minnesota Supercomputing Institute for maintenance of the Galaxy instances and supercomputing resources used for this analysis.