Wrangling Galaxy's reference data

Daniel Blankenberg, James E. Johnson, James Taylor, Anton Nekrutenko

Research output: Contribution to journalArticlepeer-review

25 Scopus citations


Summary: The Galaxy platform has developed into a fully featured collaborative workbench, with goals of inherently capturing provenance to enable reproducible data analysis, and of making it straightforward to run one's own server. However, many Galaxy platform tools rely on the presence of reference data, such as alignment indexes, to function efficiently. Until now, the building of this cache of data for Galaxy has been an error-prone manual process lacking reproducibility and provenance. The Galaxy Data Manager framework is an enhancement that changes the management of Galaxy's built-in data cache from a manual procedure to an automated graphical user interface (GUI) driven process, which contains the same openness, reproducibility and provenance that is afforded to Galaxy's analysis tools. Data Manager tools allow the Galaxy administrator to download, create and install additional datasets for any type of reference data in real time.

Original languageEnglish (US)
Pages (from-to)1917-1919
Number of pages3
Issue number13
StatePublished - Jul 1 2014

Bibliographical note

Funding Information:
Funding: This work was supported through grant number HG005542 from the National Human Genome Research Institute, National Institutes of Health, as well as grants HG005133, HG004909 and HG006620 and NSF grant DBI 0543285. Additional funding is provided by Huck Institutes for the Life Sciences at Penn State and, in part, under a grant with the Pennsylvania Department of Health using Tobacco Settlement Funds. The Department specifically disclaims responsibility for any analyses, interpretations or conclusions.


Dive into the research topics of 'Wrangling Galaxy's reference data'. Together they form a unique fingerprint.

Cite this