Progressive decreases in the cost of DNA sequencing have contributed to a decades-long exponential increase in the production of new sequencing datasets. The processing of these datasets has in turn led biology, a field that has traditionally relied on local "lab" servers to address its computational needs, to become increasingly reliant on High Performance Computing (HPC) resources. Though many operations on sequencing datasets are trivially parallelizable on multiple levels, the lack of an HPC tradition in biological research has hampered fully parallelized deployments. Here we present a lightweight flexible framework for performing parallelized processing of raw gene expression data. The framework uses a Python3 based frontend for specifying analysis options, data paths, and reference datasets. This frontend sanitizes and resolves the options, providing verbose error checking before writing a human readable configuration file and basic scripts for batch submission. The submission scripts leverage the scheduler to implement a scatter-gather approach, submitting potentially hundreds of individual jobs via a job array, each small enough to take advantage of backfill in a high contention HPC environment. The gather component is handled through a script submitted with an "after-okay" dependency.
|Original language||English (US)|
|Title of host publication||Proceedings of the Practice and Experience in Advanced Research Computing|
|Subtitle of host publication||Rise of the Machines (Learning), PEARC 2019|
|Publisher||Association for Computing Machinery|
|State||Published - Jul 28 2019|
|Event||2019 Conference on Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019 - Chicago, United States|
Duration: Jul 28 2019 → Aug 1 2019
|Name||ACM International Conference Proceeding Series|
|Conference||2019 Conference on Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019|
|Period||7/28/19 → 8/1/19|
Bibliographical notePublisher Copyright:
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Copyright 2019 Elsevier B.V., All rights reserved.