With the advent of multicore systems, the gap between processor speed and memory latency has grown worse because of their complex interconnect. Sophisticated techniques are needed more than ever to improve an application's spatial and temporal locality. This paper describes an optimization that aims to improve heap data layout by structure-splitting. It also provides runtime address checking by piggybacking on the existing page protection mechanism to guarantee the correctness of such optimization that has eluded many previous attempts due to safety concerns. The technique can be applied to both sequential and parallel programs at either compile time or runtime. However, we focus primarily on sequential programs (i.e., single-threaded programs) at runtime in this paper. Experimental results show that some benchmarks in SPEC 2000 and 2006 can achieve a speedup of up to 142.8%.
|Original language||English (US)|
|Journal||Transactions on Architecture and Code Optimization|
|State||Published - Jan 2012|
- Data layout
- Structure splitting