Abstract
Achieving large-scale distributed computing in a seamless manner introduces a number of difficult problems. The fault tolerance options for a common class of high-performance parallel applications, single-program-multiple-data (SPMD). Performance models for two fault tolerance methods, checkpoint-recovery (CR) and wide-area replication (WR), were developed. These models enable quantitative comparisons of the two methods as applied to SPMD applications.
Original language | English (US) |
---|---|
Pages (from-to) | 351-352 |
Number of pages | 2 |
Journal | IEEE International Symposium on High Performance Distributed Computing, Proceedings |
State | Published - Dec 1 1999 |
Event | Proceedings of the 1999 8th IEEE International Symposium on High Performance Distributed Computing - HPDC-8 - Redondo Beach, CA, USA Duration: Aug 3 1999 → Aug 6 1999 |