TY - GEN
T1 - Improving dynamic binary optimization through early-exit guided code region formation
AU - Hsu, Chun Chen
AU - Liu, Pangfeng
AU - Wu, Jan Jan
AU - Yew, Pen Chung
AU - Hong, Ding Yong
AU - Hsu, Wei Chung
AU - Wang, Chien Min
PY - 2013
Y1 - 2013
N2 - Most dynamic binary translators (DBT) and optimizers (DBO) target binary traces, i.e. frequently executed paths, as code regions to be translated and optimized. Code region formation is the most important first step in all DBTs and DBOs. The quality of the dynamically formed code regions determines the extent and the types of optimization opportunities that can be exposed to DBTs and DBOs, and thus, determines the ultimate quality of the final optimized code. The Next-Executing-Tail (NET) trace formation method used in HP Dynamo is an early example of such techniques. Many existing trace formation schemes are variants of NET. They work very well for most binary traces, but they also suffer a major problem: the formed traces may contain a large number of early exits that could be branched out during the execution. If this happens frequently, the program execution will spend more time in the slow binary interpreter or in the unoptimized code regions than in the optimized traces in code cache. The benefit of the trace optimization is thus lost. Traces/regions with frequently taken early-exits are called delinquent traces/regions. Our empirical study shows that at least 8 of the 12 SPEC CPU2006 integer benchmarks have delinquent traces. In this paper, we propose a light-weight region formation technique called Early-Exit Guided Region Formation (EEG) to improve the quality of the formed traces/regions. It iteratively identifies and merges delinquent regions into larger code regions. We have implemented our EEG algorithm in two LLVM-based multithreaded DBTs targeting ARM and IA32 instruction set architecture (ISA), respectively. Using SPEC CPU2006 benchmark suite with reference inputs, our results show that compared to an NET-variant currently used in QEMU, a state-of-the-art retargetable DBT, EEG can achieve a significant performance improvement of up to 72% (27% on average), and to 49% (23% on average) for IA32 and ARM, respectively.
AB - Most dynamic binary translators (DBT) and optimizers (DBO) target binary traces, i.e. frequently executed paths, as code regions to be translated and optimized. Code region formation is the most important first step in all DBTs and DBOs. The quality of the dynamically formed code regions determines the extent and the types of optimization opportunities that can be exposed to DBTs and DBOs, and thus, determines the ultimate quality of the final optimized code. The Next-Executing-Tail (NET) trace formation method used in HP Dynamo is an early example of such techniques. Many existing trace formation schemes are variants of NET. They work very well for most binary traces, but they also suffer a major problem: the formed traces may contain a large number of early exits that could be branched out during the execution. If this happens frequently, the program execution will spend more time in the slow binary interpreter or in the unoptimized code regions than in the optimized traces in code cache. The benefit of the trace optimization is thus lost. Traces/regions with frequently taken early-exits are called delinquent traces/regions. Our empirical study shows that at least 8 of the 12 SPEC CPU2006 integer benchmarks have delinquent traces. In this paper, we propose a light-weight region formation technique called Early-Exit Guided Region Formation (EEG) to improve the quality of the formed traces/regions. It iteratively identifies and merges delinquent regions into larger code regions. We have implemented our EEG algorithm in two LLVM-based multithreaded DBTs targeting ARM and IA32 instruction set architecture (ISA), respectively. Using SPEC CPU2006 benchmark suite with reference inputs, our results show that compared to an NET-variant currently used in QEMU, a state-of-the-art retargetable DBT, EEG can achieve a significant performance improvement of up to 72% (27% on average), and to 49% (23% on average) for IA32 and ARM, respectively.
KW - Dynamic binary translation
KW - Hardware-based performance monitoring
KW - Hot region formation
KW - Trace-based JIT compilation
KW - Virtual machine
UR - http://www.scopus.com/inward/record.url?scp=84875828121&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84875828121&partnerID=8YFLogxK
U2 - 10.1145/2451512.2451519
DO - 10.1145/2451512.2451519
M3 - Conference contribution
AN - SCOPUS:84875828121
SN - 9781450312660
T3 - VEE 2013 - Proceedings of the ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
SP - 23
EP - 32
BT - VEE 2013 - Proceedings of the ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
T2 - 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE 2013
Y2 - 16 March 2013 through 17 March 2013
ER -