Discovering Novel Proteoforms Using Proteogenomic Workflows Within the Galaxy Bioinformatics Platform

Praveen Kumar, James E. Johnson, Thomas McGowan, Matthew C. Chambers, Mohammad Heydarian, Subina Mehta, Caleb Easterly, Timothy J. Griffin, Pratik D. Jagtap

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Proteogenomics is a growing “multi-omics” research area that combines mass spectrometry–based proteomics and high-throughput nucleotide sequencing technologies. Proteogenomics has helped in genomic annotation for organisms whose complete genome sequences became available by using high-throughput DNA sequencing technologies. Apart from genome annotation, this multi-omics approach has also helped researchers confirm expression of variant proteins belonging to unique proteoforms that could have resulted from single-nucleotide polymorphism (SNP), insertion and deletions (Indels), splice isoforms, or other genome or transcriptome variations. A proteogenomic study depends on a multistep informatics workflow, requiring different software at each step. These integrated steps include creating an appropriate protein sequence database, matching spectral data against these sequences, and finally identifying peptide sequences corresponding to novel proteoforms followed by variant classification and functional analysis. The disparate software required for a proteogenomic study is difficult for most researchers to access and use, especially those lacking computational expertise. Furthermore, using them disjointedly can be error-prone as it requires setting up individual parameters for each software. Consequently, reproducibility suffers. Managing output files from each software is an additional challenge. One solution for these challenges in proteogenomics is the open-source Web-based computational platform Galaxy. Its capability to create and manage workflows comprised of disparate software while recording and saving all important parameters promotes both usability and reproducibility. Here, we describe a workflow that can perform proteogenomic analysis on a Galaxy-based platform. This Galaxy workflow facilitates matching of spectral data with a customized protein sequence database, identifying novel protein variants, assessing quality of results, and classifying variants along with visualization against the genome.

Original languageEnglish (US)
Title of host publicationMethods in Molecular Biology
PublisherHumana Press Inc.
Pages109-128
Number of pages20
StatePublished - 2025

Publication series

NameMethods in Molecular Biology
Volume2859
ISSN (Print)1064-3745
ISSN (Electronic)1940-6029

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2025.

Keywords

  • Galaxy-P
  • Multi-omics
  • Proteogenomics
  • Workflows

Fingerprint

Dive into the research topics of 'Discovering Novel Proteoforms Using Proteogenomic Workflows Within the Galaxy Bioinformatics Platform'. Together they form a unique fingerprint.

Cite this