TY - GEN
T1 - Exploiting speculative thread-level parallelism in data compression applications
AU - Wang, Shengyue
AU - Zhai, Antonia B
AU - Yew, Pen-Chung
PY - 2007
Y1 - 2007
N2 - Although hardware support for Thread-Level Speculation (TLS) can ease the compiler's tasks in creating parallel programs by allowing the compiler to create potentially dependent parallel threads, advanced compiler optimization techniques must be developed and judiciously applied to achieve the desired performance. In this paper, we take a close examination on two data compression benchmarks, GZIP and BZIP2, propose, implement and evaluate new compiler optimization techniques to eliminate performance bottlenecks in tíieir parallel execution and improve their performance. The proposed techniques (i) remove the critical forwarding path created by synchronizing memory-resident values; (ii) identify and categorize reduction-like variables whose intermediate results are used within loops, and propose code transformation to remove the inter-thread data dependences caused by these variables; and (iii) transform the program to eliminate stalls caused by variations in thread size. While no previous work has reported significant performance improvement on parallelizing these two benchmarks, we are able to achieve up to 36% performance improvement for GZIP and 37% for BZIP2.
AB - Although hardware support for Thread-Level Speculation (TLS) can ease the compiler's tasks in creating parallel programs by allowing the compiler to create potentially dependent parallel threads, advanced compiler optimization techniques must be developed and judiciously applied to achieve the desired performance. In this paper, we take a close examination on two data compression benchmarks, GZIP and BZIP2, propose, implement and evaluate new compiler optimization techniques to eliminate performance bottlenecks in tíieir parallel execution and improve their performance. The proposed techniques (i) remove the critical forwarding path created by synchronizing memory-resident values; (ii) identify and categorize reduction-like variables whose intermediate results are used within loops, and propose code transformation to remove the inter-thread data dependences caused by these variables; and (iii) transform the program to eliminate stalls caused by variations in thread size. While no previous work has reported significant performance improvement on parallelizing these two benchmarks, we are able to achieve up to 36% performance improvement for GZIP and 37% for BZIP2.
UR - http://www.scopus.com/inward/record.url?scp=38149015516&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38149015516&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-72521-3_10
DO - 10.1007/978-3-540-72521-3_10
M3 - Conference contribution
AN - SCOPUS:38149015516
SN - 3540725202
SN - 9783540725206
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 126
EP - 140
BT - Languages and Compilers for Parallel Computing - 19th International Workshop, LCPC 2006, Revised Papers
PB - Springer Verlag
T2 - 19th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2006
Y2 - 2 November 2006 through 4 November 2006
ER -