Abstract
In today's batch queue HPC cluster systems, the user submits a job requesting a fixed number of processors. The system will not start the job until all of the requested resources become available simultaneously. When cluster workload is high, large sized jobs will experience long waiting time due to this policy. In this paper, we propose a new approach that dynamically decomposes a large job into smaller ones to reduce waiting time, and lets the application expand across multiple subjobs while continuously achieving progress. This approach has three benefits: (i) application turnaround time is reduced, (ii) system fragmentation is diminished, and (iii) fairness is promoted. Our approach does not depend on job queue time prediction but exploits available backfill opportunities. Simulation results have shown that our approach can reduce application mean turnaround time by up to 48%.
| Original language | English (US) |
|---|---|
| Title of host publication | Proceedings of SC 2015 |
| Subtitle of host publication | The International Conference for High Performance Computing, Networking, Storage and Analysis |
| Publisher | IEEE Computer Society |
| ISBN (Electronic) | 9781450337236 |
| DOIs | |
| State | Published - Nov 15 2015 |
| Event | International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015 - Austin, United States Duration: Nov 15 2015 → Nov 20 2015 |
Publication series
| Name | International Conference for High Performance Computing, Networking, Storage and Analysis, SC |
|---|---|
| Volume | 15-20-November-2015 |
| ISSN (Print) | 2167-4329 |
| ISSN (Electronic) | 2167-4337 |
Other
| Other | International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015 |
|---|---|
| Country/Territory | United States |
| City | Austin |
| Period | 11/15/15 → 11/20/15 |
Bibliographical note
Publisher Copyright:© 2015 ACM.
Keywords
- HPC
- elasticity
- parallel job scheduling