TY - GEN
T1 - Energy efficient speculative threads
T2 - 19th International Conference on Parallel Architectures and Compilation Techniques, PACT 2010
AU - Luo, Yangchun
AU - Packirisamy, Venkatesan
AU - Hsu, Wei Chung
AU - Zhai, Antonia
PY - 2010
Y1 - 2010
N2 - Thread-level parallelism at the chip level is critical in overcoming some of the challenges that have been ushered in through the advent of modern multicore processors (CMP). Extracting speculatively parallel threads from sequential applications and executing these threads on multicore processors is a promising technique to speed up these applications on multicore systems. However, the potential degradation in energy efficiency associated is an important factor that hinders the deployment of this technique. For multicore systems that integrate same-ISA heterogeneous cores, it is possible to judiciously allocate speculative threads to achieve energy-efficient performance improvement. In this paper, we examine multicore systems with multiple same-ISA heterogeneous cores, some of which supporting simultaneous multithreading. In this environment, we propose thread-allocation mechanisms that dynamically determine how speculative threads are allocated. The proposed mechanisms can potentially allow heterogeneous multicore systems to aim to achieve significant performance improvement with moderate energy increase. At run time, for each segment of speculative parallel execution and sequential execution, the thread-allocation mechanisms make the following three decisions: (i) whether the speculative parallel threads should be deployed to a single core with SMT support or to multiple cores each supporting a single thread of execution; (ii) whether the parallel/sequential threads should utilize more powerful cores with a high issue width or a less powerful core with low issue width; (iii) whether the L1 caches should be fully activated or partially activated. The proposed thread-allocation mechanisms migrate threads and/or re-size L1 caches to maximize energy efficiency (measured in ED2P), based on these decisions. Throttling mechanisms have been incorporated in the proposed system to suppress thread management operations when the performance/energy benefit of these operations cannot justify the associated overhead. By evaluating speculatively parallelized benchmarks from SPEC CPU 2006 and 2000, we found that the proposed heterogeneous multicore system with dynamic thread management is 13% more energy efficient, in terms of ED2P, than the most energy-efficient homogeneous system. This corresponds to 4% performance improvement and 6% reduction in energy consumption. When compare to a four-issue superscalar core that execute the unmodified sequential program with a fixed L1 cache size, the proposed system is 44% more energy efficient, in terms of ED2P. This corresponds to a 38% performance improvement with 6% increase in energy consumption.
AB - Thread-level parallelism at the chip level is critical in overcoming some of the challenges that have been ushered in through the advent of modern multicore processors (CMP). Extracting speculatively parallel threads from sequential applications and executing these threads on multicore processors is a promising technique to speed up these applications on multicore systems. However, the potential degradation in energy efficiency associated is an important factor that hinders the deployment of this technique. For multicore systems that integrate same-ISA heterogeneous cores, it is possible to judiciously allocate speculative threads to achieve energy-efficient performance improvement. In this paper, we examine multicore systems with multiple same-ISA heterogeneous cores, some of which supporting simultaneous multithreading. In this environment, we propose thread-allocation mechanisms that dynamically determine how speculative threads are allocated. The proposed mechanisms can potentially allow heterogeneous multicore systems to aim to achieve significant performance improvement with moderate energy increase. At run time, for each segment of speculative parallel execution and sequential execution, the thread-allocation mechanisms make the following three decisions: (i) whether the speculative parallel threads should be deployed to a single core with SMT support or to multiple cores each supporting a single thread of execution; (ii) whether the parallel/sequential threads should utilize more powerful cores with a high issue width or a less powerful core with low issue width; (iii) whether the L1 caches should be fully activated or partially activated. The proposed thread-allocation mechanisms migrate threads and/or re-size L1 caches to maximize energy efficiency (measured in ED2P), based on these decisions. Throttling mechanisms have been incorporated in the proposed system to suppress thread management operations when the performance/energy benefit of these operations cannot justify the associated overhead. By evaluating speculatively parallelized benchmarks from SPEC CPU 2006 and 2000, we found that the proposed heterogeneous multicore system with dynamic thread management is 13% more energy efficient, in terms of ED2P, than the most energy-efficient homogeneous system. This corresponds to 4% performance improvement and 6% reduction in energy consumption. When compare to a four-issue superscalar core that execute the unmodified sequential program with a fixed L1 cache size, the proposed system is 44% more energy efficient, in terms of ED2P. This corresponds to a 38% performance improvement with 6% increase in energy consumption.
KW - dynamic resource allocation
KW - energy efficiency
KW - heterogeneous multicore
KW - thread-level speculation
UR - http://www.scopus.com/inward/record.url?scp=78149269571&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78149269571&partnerID=8YFLogxK
U2 - 10.1145/1854273.1854329
DO - 10.1145/1854273.1854329
M3 - Conference contribution
AN - SCOPUS:78149269571
SN - 9781450301787
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 453
EP - 464
BT - PACT'10 - Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 11 September 2010 through 15 September 2010
ER -