TY - GEN
T1 - Using incorrect speculation to prefetch data in a concurrent multithreaded processor
AU - Chen, Ying
AU - Sendag, Resit
AU - Lilja, David J
PY - 2003/1/1
Y1 - 2003/1/1
N2 - Concurrent multithreaded architectures exploit both instruction-level and thread-level parallelism through a combination of branch prediction and thread-level control speculation. The resulting speculative issuing of load instructions in these architectures can significantly impact the performance of the memory hierarchy as the system exploits higher degrees of parallelism. In this study, we investigate the effects of executing the mispredicted load instructions on the cache performance of a scalable multithreaded architecture. We show that the execution of loads from the wrongly-predicted branch path within a thread, or from a wrongly forked thread, can result in an indirect prefetching effect for later correctly-executed paths. By continuing to execute the mispredicted load instructions even after the instruction- or thread-level control speculation is known to be incorrect, the cache misses for the correctly predicted paths and threads can be reduced, typically by 42-73%. We introduce the small, fully-associative Wrong Execution Cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5% on the benchmark programs tested, with an average improvement of 9.7%, due to the reductions in the number of cache misses.
AB - Concurrent multithreaded architectures exploit both instruction-level and thread-level parallelism through a combination of branch prediction and thread-level control speculation. The resulting speculative issuing of load instructions in these architectures can significantly impact the performance of the memory hierarchy as the system exploits higher degrees of parallelism. In this study, we investigate the effects of executing the mispredicted load instructions on the cache performance of a scalable multithreaded architecture. We show that the execution of loads from the wrongly-predicted branch path within a thread, or from a wrongly forked thread, can result in an indirect prefetching effect for later correctly-executed paths. By continuing to execute the mispredicted load instructions even after the instruction- or thread-level control speculation is known to be incorrect, the cache misses for the correctly predicted paths and threads can be reduced, typically by 42-73%. We introduce the small, fully-associative Wrong Execution Cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5% on the benchmark programs tested, with an average improvement of 9.7%, due to the reductions in the number of cache misses.
UR - http://www.scopus.com/inward/record.url?scp=84947279167&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84947279167&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2003.1213177
DO - 10.1109/IPDPS.2003.1213177
M3 - Conference contribution
AN - SCOPUS:84947279167
T3 - Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2003
BT - Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2003
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - International Parallel and Distributed Processing Symposium, IPDPS 2003
Y2 - 22 April 2003 through 26 April 2003
ER -