Abstract
Dating back to 2012, a joint effort between Fairview Hospital’s Molecular Diagnostic Lab and the Minnesota Supercomputing Institute introduced the Next-Generation Sequencing Diagnostic Pipeline (NGSDP) [17, 20], a novel cloud-based analysis pipeline that was validated for clinical use to be compliant with the Clinical Laboratory Improvement Amendments (CLIA) . Since then, the partnership has grown the portfolio of active pipelines to seven, and expanded the available gene panels to over 6700 genes. Likewise, the design of the backing cloud infrastructure has been significantly overhauled for efficiency, scalability, automation, and fault tolerance. This document introduces our revised cloud-native, scheduled infrastructure solution for containerized applications. The cloud-native infrastructure, deployed on Amazon Web Services, has two core components: a compute cluster of virtual machines, and a simple queue-based job scheduler. The cluster runs analysis pipelines packaged as Docker containers; each with their own resource requirements. The scheduler, akin to products for traditional HPC, allows batch submission of patient samples with a persistent job queue. The scheduler also orchestrates cluster scale-out and -in to match job needs in an effort to minimize idle servers and reduce the cost of computing on a public cloud. Our solution has been successfully running as CLIA-validated pipelines since 2016. Two variants of the solution are presented to address the need for such architecture in both the traditional public cloud space, as well as in Amazon’s GovCloud where fewer cloud services are available to handle sensitive or controlled-access data. Furthermore, the innovative solution is driven by a traditional research computing environment; details of which are presented with emphasis on security, monitoring, and user workflow.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the Practice and Experience in Advanced Research Computing |
Subtitle of host publication | Rise of the Machines (Learning), PEARC 2019 |
Publisher | Association for Computing Machinery |
ISBN (Electronic) | 9781450372275 |
DOIs | |
State | Published - Jul 28 2019 |
Event | 2019 Conference on Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019 - Chicago, United States Duration: Jul 28 2019 → Aug 1 2019 |
Publication series
Name | ACM International Conference Proceeding Series |
---|
Conference
Conference | 2019 Conference on Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019 |
---|---|
Country/Territory | United States |
City | Chicago |
Period | 7/28/19 → 8/1/19 |
Bibliographical note
Publisher Copyright:© 2019 Copyright held by the owner/author(s). Publication rights licensed to the Association for Computing Machinery.
Keywords
- ACM proceedings
- Amazon Web Services
- Batch Jobs
- CLIA
- Clinical Pipelines
- Cloud Infrastructure
- Docker
- Personalized Medicine
- Scheduler