TY - GEN

T1 - Accurate statistical approaches for generating representative workload compositions

AU - Eeckhout, Lieven

AU - Sundareswarat, Rashmi

AU - Yi, Joshua J.

AU - Lilja, David J

AU - Schrater, Paul R

N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.

PY - 2005

Y1 - 2005

N2 - Composing a representative workload is a crucial step during the design process of a microprocessor. The workload should be composed in such a way that it is representative for the target domain of application and yet, the amount of redundancy in the workload should be minimized as much as possible in order not to overly increase the total simulation time. As a result, there is an important trade-off that needs to be made between workload representativeness and simulation accuracy versus simulation speed. Previous work used statistical data analysis techniques to identify representative benchmarks and corresponding inputs, also called a subset, from a large set of potential benchmarks and inputs. These methodologies measure a number of program characteristics on which Principal Components Analysis (PCA) is applied before identifying distinct program behaviors among the benchmarks using cluster analysis. In this paper we propose Independent Components Analysis (ICA) as a better alternative to PCA as it does not assume that the original data set has a Gaussian distribution, which allows ICA to better find the important axes in the workload space. Our experimental results using SPEC CPU2000 benchmarks show that ICA significantly outperforms PCA in that ICA achieves smaller benchmark subsets that are more accurate than those found by PCA.

AB - Composing a representative workload is a crucial step during the design process of a microprocessor. The workload should be composed in such a way that it is representative for the target domain of application and yet, the amount of redundancy in the workload should be minimized as much as possible in order not to overly increase the total simulation time. As a result, there is an important trade-off that needs to be made between workload representativeness and simulation accuracy versus simulation speed. Previous work used statistical data analysis techniques to identify representative benchmarks and corresponding inputs, also called a subset, from a large set of potential benchmarks and inputs. These methodologies measure a number of program characteristics on which Principal Components Analysis (PCA) is applied before identifying distinct program behaviors among the benchmarks using cluster analysis. In this paper we propose Independent Components Analysis (ICA) as a better alternative to PCA as it does not assume that the original data set has a Gaussian distribution, which allows ICA to better find the important axes in the workload space. Our experimental results using SPEC CPU2000 benchmarks show that ICA significantly outperforms PCA in that ICA achieves smaller benchmark subsets that are more accurate than those found by PCA.

UR - http://www.scopus.com/inward/record.url?scp=33749055123&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33749055123&partnerID=8YFLogxK

U2 - 10.1109/IISWC.2005.1526001

DO - 10.1109/IISWC.2005.1526001

M3 - Conference contribution

AN - SCOPUS:33749055123

SN - 0780394615

SN - 9780780394612

T3 - Proceedings of the 2005 IEEE International Symposium on Workload Characterization, IISWC-2005

SP - 56

EP - 66

BT - Proceedings of the 2005 IEEE International Symposium on Workload Characterization, IISWC-2005

T2 - 2005 IEEE International Symposium on Workload Characterization, IISWC-2005

Y2 - 6 October 2005 through 8 October 2005

ER -