Abstract
Achieving large-scale distributed computing in a seamless manner introduces a number of difficult problems. The fault tolerance options for a common class of high-performance parallel applications, single-program-multiple-data (SPMD). Performance models for two fault tolerance methods, checkpoint-recovery (CR) and wide-area replication (WR), were developed. These models enable quantitative comparisons of the two methods as applied to SPMD applications.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 351-352 |
| Number of pages | 2 |
| Journal | IEEE International Symposium on High Performance Distributed Computing, Proceedings |
| State | Published - 1999 |
| Event | Proceedings of the 1999 8th IEEE International Symposium on High Performance Distributed Computing - HPDC-8 - Redondo Beach, CA, USA Duration: Aug 3 1999 → Aug 6 1999 |