Leveraging public cloud services for clia-certified personalized medicine pipelines

Evan F. Bollig, Christine Henzler, Ham C. Lam, Sarah A. Munro, Rebecca LaRue, Getiria Onsongo, Sophia Yohe, Andrew C. Nelson, Matthew Bower, Matthew Schomaker, Bharat Thyagarajan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Dating back to 2012, a joint effort between Fairview Hospital’s Molecular Diagnostic Lab and the Minnesota Supercomputing Institute introduced the Next-Generation Sequencing Diagnostic Pipeline (NGSDP) [17, 20], a novel cloud-based analysis pipeline that was validated for clinical use to be compliant with the Clinical Laboratory Improvement Amendments (CLIA) . Since then, the partnership has grown the portfolio of active pipelines to seven, and expanded the available gene panels to over 6700 genes. Likewise, the design of the backing cloud infrastructure has been significantly overhauled for efficiency, scalability, automation, and fault tolerance. This document introduces our revised cloud-native, scheduled infrastructure solution for containerized applications. The cloud-native infrastructure, deployed on Amazon Web Services, has two core components: a compute cluster of virtual machines, and a simple queue-based job scheduler. The cluster runs analysis pipelines packaged as Docker containers; each with their own resource requirements. The scheduler, akin to products for traditional HPC, allows batch submission of patient samples with a persistent job queue. The scheduler also orchestrates cluster scale-out and -in to match job needs in an effort to minimize idle servers and reduce the cost of computing on a public cloud. Our solution has been successfully running as CLIA-validated pipelines since 2016. Two variants of the solution are presented to address the need for such architecture in both the traditional public cloud space, as well as in Amazon’s GovCloud where fewer cloud services are available to handle sensitive or controlled-access data. Furthermore, the innovative solution is driven by a traditional research computing environment; details of which are presented with emphasis on security, monitoring, and user workflow.

Original languageEnglish (US)
Title of host publicationProceedings of the Practice and Experience in Advanced Research Computing
Subtitle of host publicationRise of the Machines (Learning), PEARC 2019
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450372275
DOIs
StatePublished - Jul 28 2019
Event2019 Conference on Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019 - Chicago, United States
Duration: Jul 28 2019Aug 1 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2019 Conference on Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019
CountryUnited States
CityChicago
Period7/28/198/1/19

Keywords

  • ACM proceedings
  • Amazon Web Services
  • Batch Jobs
  • CLIA
  • Clinical Pipelines
  • Cloud Infrastructure
  • Docker
  • Personalized Medicine
  • Scheduler

Fingerprint Dive into the research topics of 'Leveraging public cloud services for clia-certified personalized medicine pipelines'. Together they form a unique fingerprint.

  • Cite this

    Bollig, E. F., Henzler, C., Lam, H. C., Munro, S. A., LaRue, R., Onsongo, G., Yohe, S., Nelson, A. C., Bower, M., Schomaker, M., & Thyagarajan, B. (2019). Leveraging public cloud services for clia-certified personalized medicine pipelines. In Proceedings of the Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019 [3332244] (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3332186.3332244