Achieving large-scale distributed computing in a seamless manner introduces a number of difficult problems. The fault tolerance options for a common class of high-performance parallel applications, single-program-multiple-data (SPMD). Performance models for two fault tolerance methods, checkpoint-recovery (CR) and wide-area replication (WR), were developed. These models enable quantitative comparisons of the two methods as applied to SPMD applications.
|Original language||English (US)|
|Number of pages||2|
|Journal||IEEE International Symposium on High Performance Distributed Computing, Proceedings|
|State||Published - Dec 1 1999|
|Event||Proceedings of the 1999 8th IEEE International Symposium on High Performance Distributed Computing - HPDC-8 - Redondo Beach, CA, USA|
Duration: Aug 3 1999 → Aug 6 1999