ACR: Amnesic checkpointing and recovery

Ismail Akturk, Ulya R. Karpuzcu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Systematic checkpointing of the machine state makes restart of execution from a safe state possible upon detection of an error. The time and energy overhead of checkpointing, however, grows with the frequency of checkpointing. Considering the growth of expected error rates, amortizing this overhead becomes especially challenging, as checkpointing frequency tends to increase with increasing error rates. Based on the observation that due to imbalanced technology scaling, recomputing a data value can be more energy efficient than retrieving (i.e., loading) a stored copy, this paper explores how recomputation of data values (which otherwise would be read from a checkpoint from memory or secondary storage) can reduce the machine state to be checkpointed, and thereby, the checkpointing overhead. Even in a relatively small scale system, recomputation-based checkpointing can reduce the storage overhead by up to 23.91%; time overhead, by 11.92%; and energy overhead, by 12.53%, respectively.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE International Symposium on High Performance Computer Architecture, HPCA 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages30-43
Number of pages14
ISBN (Electronic)9781728161495
DOIs
StatePublished - Feb 2020
Event26th IEEE International Symposium on High Performance Computer Architecture, HPCA 2020 - San Diego, United States
Duration: Feb 22 2020Feb 26 2020

Publication series

NameProceedings - 2020 IEEE International Symposium on High Performance Computer Architecture, HPCA 2020

Conference

Conference26th IEEE International Symposium on High Performance Computer Architecture, HPCA 2020
CountryUnited States
CitySan Diego
Period2/22/202/26/20

Keywords

  • Checkpointing
  • Recomputation
  • Recovery

Fingerprint Dive into the research topics of 'ACR: Amnesic checkpointing and recovery'. Together they form a unique fingerprint.

Cite this