TY - GEN
T1 - Automated algorithmic error resilience for structured grid problems based on outlier detection
AU - Suresh, Amoghavarsha
AU - Sartori, John M
N1 - Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2014
Y1 - 2014
N2 - In this paper, we propose automated algorithmic error re- silience based on outlier detection. Our approach exploits the characteristic behavior of a class of applications to cre- Ate metric functions that normally produce metric values according to a designed distribution or behavior and pro- duce outlier values (i.e., values that do not conform to the designed distribution or behavior) when computations are affected by errors. For a robust algorithm that employs such an approach, error detection becomes equivalent to outlier detection. As such, we can make use of well-established, statistically rigorous techniques for outlier detection to effec- Tively and efficiently detect errors, and subsequently correct them. Our error-resilient algorithms incur significantly lower overhead than traditional hardware and software error re- silience techniques. Also, compared to previous approaches to application-based error resilience, our approaches param- eterize the robustification process, making it easy to auto- matically transform large classes of applications into robust applications with the use of parser-based tools and mini- mal programmer effort. We demonstrate the use of auto- mated error resilience based on outlier detection for struc- Tured grid problems, leveraging the flexibility of algorithmic error resilience to achieve improved application robustness and lower overhead compared to previous error resilience ap- proaches. We demonstrate 2×-3× improvement in output quality compared to the original algorithm with only 22% overhead, on average, for non-iterative structured grid prob- lems. Average overhead is as low as 4.5% for error-resilient iterative structured grid algorithms that tolerate error rates up to 10E-3 and achieve the same output quality as their error-free counterparts.
AB - In this paper, we propose automated algorithmic error re- silience based on outlier detection. Our approach exploits the characteristic behavior of a class of applications to cre- Ate metric functions that normally produce metric values according to a designed distribution or behavior and pro- duce outlier values (i.e., values that do not conform to the designed distribution or behavior) when computations are affected by errors. For a robust algorithm that employs such an approach, error detection becomes equivalent to outlier detection. As such, we can make use of well-established, statistically rigorous techniques for outlier detection to effec- Tively and efficiently detect errors, and subsequently correct them. Our error-resilient algorithms incur significantly lower overhead than traditional hardware and software error re- silience techniques. Also, compared to previous approaches to application-based error resilience, our approaches param- eterize the robustification process, making it easy to auto- matically transform large classes of applications into robust applications with the use of parser-based tools and mini- mal programmer effort. We demonstrate the use of auto- mated error resilience based on outlier detection for struc- Tured grid problems, leveraging the flexibility of algorithmic error resilience to achieve improved application robustness and lower overhead compared to previous error resilience ap- proaches. We demonstrate 2×-3× improvement in output quality compared to the original algorithm with only 22% overhead, on average, for non-iterative structured grid prob- lems. Average overhead is as low as 4.5% for error-resilient iterative structured grid algorithms that tolerate error rates up to 10E-3 and achieve the same output quality as their error-free counterparts.
KW - Algorithmic error resilience
KW - Application robustification
KW - Out- lier detection
KW - Structured grids
UR - http://www.scopus.com/inward/record.url?scp=84900568041&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84900568041&partnerID=8YFLogxK
U2 - 10.1145/2544137.2544140
DO - 10.1145/2544137.2544140
M3 - Conference contribution
AN - SCOPUS:84900568041
SN - 9781450326704
T3 - Proceedings of the 12th ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2014
SP - 240
EP - 250
BT - Proceedings of the 12th ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2014
PB - Association for Computing Machinery
T2 - 12th ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2014
Y2 - 15 February 2014 through 19 February 2014
ER -