TY - GEN
T1 - Extracting the textual and temporal structure of supercomputing logs
AU - Jain, Sourabh
AU - Singh, Inderpreet
AU - Chandra, Abhishek
AU - Zhang, Zhi Li
AU - Bronevetsky, Greg
PY - 2009
Y1 - 2009
N2 - Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an online clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.
AB - Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an online clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.
UR - http://www.scopus.com/inward/record.url?scp=77952121182&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952121182&partnerID=8YFLogxK
U2 - 10.1109/HIPC.2009.5433202
DO - 10.1109/HIPC.2009.5433202
M3 - Conference contribution
AN - SCOPUS:77952121182
SN - 9781424449224
T3 - 16th International Conference on High Performance Computing, HiPC 2009 - Proceedings
SP - 254
EP - 263
BT - 16th International Conference on High Performance Computing, HiPC 2009 - Proceedings
T2 - 16th International Conference on High Performance Computing, HiPC 2009
Y2 - 16 December 2009 through 19 December 2009
ER -