Multiomics approaches focused on mass spectrometry (MS)-based data, such as metaproteomics, utilize genomic and/or transcriptomic sequencing data to generate a comprehensive protein sequence database. These databases can be very large, containing millions of sequences, which reduces the sensitivity of matching tandem mass spectrometry (MS/MS) data to sequences to generate peptide spectrum matches (PSMs). Here, we describe and evaluate a sectioning method for generating an enriched database for those protein sequences that are most likely present in the sample. Our evaluation demonstrates how this method helps to increase the sensitivity of PSMs while maintaining acceptable false discovery rate statistics-offering a flexible alternative to traditional large database searching, as well as previously described two-step database searching methods for large sequence database applications. Furthermore, implementation in the Galaxy platform provides access to an automated and customizable workflow for carrying out the method. Additionally, the results of this study provide valuable insights into the advantages and limitations offered by available methods aimed at addressing challenges of genome-guided, large database applications in proteomics. Relevant raw data has been made available at https://zenodo.org/ using data set identifier "3754789"and https://arcticdata.io/catalog using data set identifier "A2VX06340".
Bibliographical noteFunding Information:
We would like to thank European Galaxy and Freiburg Galaxy team for providing the computational resource and storage for the Galaxy implementation and sharing of the database sectioning method. We would also like to thank the Minnesota Supercomputing Institute (MSI) at the University of Minnesota and Jetstream for computational resources. The SIHUMI MS-data was acquired and shared by Dr. Robert Hettich’s Lab at the Oak Ridge National Laboratory. We would like to thank Dr. Hettich’s group and Dr. Nico Jehmlich from Helmholtz Center for Environmental Research for introducing us to the SIHUMI dataset through the International Metaproteome Symposium. We acknowledge funding for this work from the grant National Cancer Institute–Informatics Technology for Cancer Research (NCI-ITCR) “1U24CA199347”, National Science Foundation (U.S.) grant “1458524”, and a grant through the Norwegian Centennial Chair (NOCC) program at the University of Minnesota to T.J.G. We would also like to acknowledge the Extreme Science and Engineering Discovery Environment (XSEDE) research allocation BIO170096 to P.D.J. and use of the Jetstream cloud-based computing resource for scientific computing ( https://jetstream-cloud.org/ ) maintained at Indiana University. The European Galaxy server that was used for parts of this work is in part funded by Collaborative Research Centre 992 Medical Epigenetics (DFG grant SFB 992/1 2012) and German Federal Ministry of Education and Research (BMBF grants 031 A538A/A538C RBC, 031L0101 B/031L0101C de.NBI-epi, 031L0106 de.STAIR (de.NBI)).
Copyright © 2020 American Chemical Society.
- false discovery rate
- peptide spectrum match
- tandem mass spectrometry
PubMed: MeSH publication types
- Journal Article
- Research Support, N.I.H., Extramural
- Research Support, Non-U.S. Gov't
- Research Support, U.S. Gov't, Non-P.H.S.