TY - GEN
T1 - SCRAP
T2 - 2007 IEEE International Symposium on Workload Characterization, IISWC
AU - Skarie, James
AU - Debnath, Biplob K.
AU - Lilja, David J
AU - Mokbel, Mohamed F
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2007
Y1 - 2007
N2 - With the tremendous growth in stored data, the role of database systems has become more significant than ever before. Standard query workloads, such as the TPC-C and TPC-H benchmark suites, are used to evaluate and tune the functionality and performance of database systems. Running and configuring benchmarks is a time consuming task. It requires substantial statistical expertise due to the enormous data size and large number of queries in the workload. Subsetting can be used to reduce the number of queries in a workload. An existing workload subsetting technique selected queries based on similarities of the ranks of the queries for low-level characteristics, such as cache miss rates, or based on the execution time required in different computer systems. However, many low-level characteristics are correlated, produce similar behaviors. Also, raw execution time as a metric is too diffuse to capture important performance bottlenecks. Our goal is to select a subset of queries that can reproduce the same bottlenecks in the system as the original workload. In this paper, we propose a statistical approach for creating a database query workload based on performance bottlenecks (SCRAP). Our methodology takes a query workload and a set of system configuration parameters as inputs, and selects a subset of the queries from the workload based on the similarity of performance bottlenecks. Experimental results using the TPC-H benchmark and the PostgreSQL database system, show that the reduced workload and the original workload produce similar performance bottlenecks, and the subset accurately estimates the total execution time.
AB - With the tremendous growth in stored data, the role of database systems has become more significant than ever before. Standard query workloads, such as the TPC-C and TPC-H benchmark suites, are used to evaluate and tune the functionality and performance of database systems. Running and configuring benchmarks is a time consuming task. It requires substantial statistical expertise due to the enormous data size and large number of queries in the workload. Subsetting can be used to reduce the number of queries in a workload. An existing workload subsetting technique selected queries based on similarities of the ranks of the queries for low-level characteristics, such as cache miss rates, or based on the execution time required in different computer systems. However, many low-level characteristics are correlated, produce similar behaviors. Also, raw execution time as a metric is too diffuse to capture important performance bottlenecks. Our goal is to select a subset of queries that can reproduce the same bottlenecks in the system as the original workload. In this paper, we propose a statistical approach for creating a database query workload based on performance bottlenecks (SCRAP). Our methodology takes a query workload and a set of system configuration parameters as inputs, and selects a subset of the queries from the workload based on the similarity of performance bottlenecks. Experimental results using the TPC-H benchmark and the PostgreSQL database system, show that the reduced workload and the original workload produce similar performance bottlenecks, and the subset accurately estimates the total execution time.
UR - http://www.scopus.com/inward/record.url?scp=47349087255&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=47349087255&partnerID=8YFLogxK
U2 - 10.1109/IISWC.2007.4362194
DO - 10.1109/IISWC.2007.4362194
M3 - Conference contribution
AN - SCOPUS:47349087255
SN - 1424415616
SN - 9781424415618
T3 - Proceedings of the 2007 IEEE International Symposium on Workload Characterization, IISWC
SP - 183
EP - 192
BT - Proceedings of the 2007 IEEE International Symposium on Workload Characterization, IISWC
Y2 - 27 September 2007 through 29 September 2007
ER -