Fault tolerant computing on the grid: What are my options?

Research output: Contribution to journalConference articlepeer-review

12 Scopus citations

Abstract

Achieving large-scale distributed computing in a seamless manner introduces a number of difficult problems. The fault tolerance options for a common class of high-performance parallel applications, single-program-multiple-data (SPMD). Performance models for two fault tolerance methods, checkpoint-recovery (CR) and wide-area replication (WR), were developed. These models enable quantitative comparisons of the two methods as applied to SPMD applications.

Original languageEnglish (US)
Pages (from-to)351-352
Number of pages2
JournalIEEE International Symposium on High Performance Distributed Computing, Proceedings
StatePublished - Dec 1 1999
EventProceedings of the 1999 8th IEEE International Symposium on High Performance Distributed Computing - HPDC-8 - Redondo Beach, CA, USA
Duration: Aug 3 1999Aug 6 1999

Fingerprint Dive into the research topics of 'Fault tolerant computing on the grid: What are my options?'. Together they form a unique fingerprint.

Cite this