Abstract
Cache coherence schemes that dynamically adapt to memory referencing patterns have been proposed to improve coherence enforcement in shared-memory multiprocessors. By using only run-time information, however, these existing schemes are incapable of looking ahead in the memory referencing stream. We present a combined hardware-software strategy that uses the predictive capability of the compiler to select updating or invalidating for each write reference. To determine the potential performance improvement that can be achieved with this optimization, three different levels of compiler capabilities are examined. Simulations using memory traces show that with an ideal compiler, this optimization can potentially reduce the miss ratio by 0.4% to 15% compared to an invalidating-only scheme, while reducing the generated network traffic by 13% to 94 % compared to an updating-only scheme. In addition, this optimization can potentially reduce the miss ratio by up to 13%, while reducing the generated. network traffic by up to 92%, compared to a dynamic adaptive scheme. Furthermore, performance can be potentially improved even with a compiler capable of performing only imprecise array subscript analysis and no interprocedural analysis.
Original language | English (US) |
---|---|
Pages (from-to) | 470-481 |
Number of pages | 12 |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 6 |
Issue number | 5 |
DOIs | |
State | Published - May 1995 |
Bibliographical note
Funding Information:Manuscript received September 4, 1993; revised February 28, 1994. This work was supported in part by the National Science Foundation under Grant CCR-9209458, by the research funds of the Graduate School of the University of Minnesota, and under a grant by the AT&T Foundation. The authors are with the Department of Electrical Engineering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: [email protected]); ([email protected]). IEEE Log Number 9409878.
Keywords
- Compiler optimization
- cache coherence
- directory
- invalidate
- multiprocessor
- shared-memory
- update