Improve your Galaxy text life: The Query Tabular Tool [version 1; referees: 1 approved, 2 approved with reservations]

James E Johnson, Praveen Kumar, Caleb Easterly, Mark Esler, Subina Mehta, Arthur C Eschenlauer, Adrian D Hegeman, Pratik D Jagtap, Timothy J Griffin

Research output: Contribution to journalArticle

Abstract

Galaxy provides an accessible platform where multi-step data analysis workflows integrating disparate software can be run, even by researchers with limited programming expertise. Applications of such sophisticated workflows are many, including those which integrate software from different ‘omic domains (e.g. genomics, proteomics, metabolomics). In these complex workflows, intermediate outputs are often generated as tabular text files, which must be transformed into customized formats which are compatible with the next software tools in the pipeline. Consequently, many text manipulation steps are added to an already complex workflow, overly complicating the process and decreasing usability, especially for non-expert bench researchers focused on obtaining results. In some cases, limitations to existing text manipulation are such that desired analyses can only be carried out using highly sophisticated processing steps beyond the reach of most users. As a solution, we have developed the Query Tabular Galaxy tool, which leverages a SQLite database generated from tabular input data. This database can be queried and manipulated to produce transformed and customized tabular outputs compatible with downstream processing steps. Regular expressions can also be utilized for even more sophisticated manipulations, such as find and replace and other filtering actions. Using several Galaxy-based multi-omic workflows as an example, we demonstrate how the Query Tabular tool dramatically streamlines and simplifies the creation of multi-step analyses, efficiently enabling complicated textual manipulations and processing. This tool should find broad utility for users of the Galaxy platform seeking to develop and use sophisticated workflows involving text manipulation on tabular outputs.

Original languageEnglish (US)
Article number1604
JournalF1000Research
Volume7
DOIs
StatePublished - Jan 1 2018

Fingerprint

Galaxies
Workflow
Software
Processing
Research Personnel
Databases
Metabolomics
Pipelines
Genomics
Proteomics

Keywords

  • Galaxy
  • Genomics
  • Metabolomics
  • Metaproteomics
  • Multi-omics
  • Proteogenomics
  • Proteomics
  • SQLite
  • Workflows

PubMed: MeSH publication types

  • Journal Article
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, N.I.H., Extramural

Cite this

@article{bf5759e4a7e04ed590f4afba4aaab065,
title = "Improve your Galaxy text life: The Query Tabular Tool [version 1; referees: 1 approved, 2 approved with reservations]",
abstract = "Galaxy provides an accessible platform where multi-step data analysis workflows integrating disparate software can be run, even by researchers with limited programming expertise. Applications of such sophisticated workflows are many, including those which integrate software from different ‘omic domains (e.g. genomics, proteomics, metabolomics). In these complex workflows, intermediate outputs are often generated as tabular text files, which must be transformed into customized formats which are compatible with the next software tools in the pipeline. Consequently, many text manipulation steps are added to an already complex workflow, overly complicating the process and decreasing usability, especially for non-expert bench researchers focused on obtaining results. In some cases, limitations to existing text manipulation are such that desired analyses can only be carried out using highly sophisticated processing steps beyond the reach of most users. As a solution, we have developed the Query Tabular Galaxy tool, which leverages a SQLite database generated from tabular input data. This database can be queried and manipulated to produce transformed and customized tabular outputs compatible with downstream processing steps. Regular expressions can also be utilized for even more sophisticated manipulations, such as find and replace and other filtering actions. Using several Galaxy-based multi-omic workflows as an example, we demonstrate how the Query Tabular tool dramatically streamlines and simplifies the creation of multi-step analyses, efficiently enabling complicated textual manipulations and processing. This tool should find broad utility for users of the Galaxy platform seeking to develop and use sophisticated workflows involving text manipulation on tabular outputs.",
keywords = "Galaxy, Genomics, Metabolomics, Metaproteomics, Multi-omics, Proteogenomics, Proteomics, SQLite, Workflows",
author = "Johnson, {James E} and Praveen Kumar and Caleb Easterly and Mark Esler and Subina Mehta and Eschenlauer, {Arthur C} and Hegeman, {Adrian D} and Jagtap, {Pratik D} and Griffin, {Timothy J}",
year = "2018",
month = "1",
day = "1",
doi = "10.12688/f1000research.16450.1",
language = "English (US)",
volume = "7",
journal = "F1000Research",
issn = "2046-1402",
publisher = "F1000 Research Ltd.",

}

TY - JOUR

T1 - Improve your Galaxy text life

T2 - The Query Tabular Tool [version 1; referees: 1 approved, 2 approved with reservations]

AU - Johnson, James E

AU - Kumar, Praveen

AU - Easterly, Caleb

AU - Esler, Mark

AU - Mehta, Subina

AU - Eschenlauer, Arthur C

AU - Hegeman, Adrian D

AU - Jagtap, Pratik D

AU - Griffin, Timothy J

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Galaxy provides an accessible platform where multi-step data analysis workflows integrating disparate software can be run, even by researchers with limited programming expertise. Applications of such sophisticated workflows are many, including those which integrate software from different ‘omic domains (e.g. genomics, proteomics, metabolomics). In these complex workflows, intermediate outputs are often generated as tabular text files, which must be transformed into customized formats which are compatible with the next software tools in the pipeline. Consequently, many text manipulation steps are added to an already complex workflow, overly complicating the process and decreasing usability, especially for non-expert bench researchers focused on obtaining results. In some cases, limitations to existing text manipulation are such that desired analyses can only be carried out using highly sophisticated processing steps beyond the reach of most users. As a solution, we have developed the Query Tabular Galaxy tool, which leverages a SQLite database generated from tabular input data. This database can be queried and manipulated to produce transformed and customized tabular outputs compatible with downstream processing steps. Regular expressions can also be utilized for even more sophisticated manipulations, such as find and replace and other filtering actions. Using several Galaxy-based multi-omic workflows as an example, we demonstrate how the Query Tabular tool dramatically streamlines and simplifies the creation of multi-step analyses, efficiently enabling complicated textual manipulations and processing. This tool should find broad utility for users of the Galaxy platform seeking to develop and use sophisticated workflows involving text manipulation on tabular outputs.

AB - Galaxy provides an accessible platform where multi-step data analysis workflows integrating disparate software can be run, even by researchers with limited programming expertise. Applications of such sophisticated workflows are many, including those which integrate software from different ‘omic domains (e.g. genomics, proteomics, metabolomics). In these complex workflows, intermediate outputs are often generated as tabular text files, which must be transformed into customized formats which are compatible with the next software tools in the pipeline. Consequently, many text manipulation steps are added to an already complex workflow, overly complicating the process and decreasing usability, especially for non-expert bench researchers focused on obtaining results. In some cases, limitations to existing text manipulation are such that desired analyses can only be carried out using highly sophisticated processing steps beyond the reach of most users. As a solution, we have developed the Query Tabular Galaxy tool, which leverages a SQLite database generated from tabular input data. This database can be queried and manipulated to produce transformed and customized tabular outputs compatible with downstream processing steps. Regular expressions can also be utilized for even more sophisticated manipulations, such as find and replace and other filtering actions. Using several Galaxy-based multi-omic workflows as an example, we demonstrate how the Query Tabular tool dramatically streamlines and simplifies the creation of multi-step analyses, efficiently enabling complicated textual manipulations and processing. This tool should find broad utility for users of the Galaxy platform seeking to develop and use sophisticated workflows involving text manipulation on tabular outputs.

KW - Galaxy

KW - Genomics

KW - Metabolomics

KW - Metaproteomics

KW - Multi-omics

KW - Proteogenomics

KW - Proteomics

KW - SQLite

KW - Workflows

UR - http://www.scopus.com/inward/record.url?scp=85058748568&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058748568&partnerID=8YFLogxK

U2 - 10.12688/f1000research.16450.1

DO - 10.12688/f1000research.16450.1

M3 - Article

C2 - 30519459

AN - SCOPUS:85058748568

VL - 7

JO - F1000Research

JF - F1000Research

SN - 2046-1402

M1 - 1604

ER -