Value prediction is a technique that breaks true data dependences by predicting the outcome of an instruction and speculatively executes its data-dependent instructions based on the predicted outcome. As the instruction fetch rate and issue rate of processors increase, the potential data dependences among instructions issued in the same cycle also increase, Value prediction and speculative execution become critical to keep the issue rate high. Unfortunately, most of the proposed value prediction schemes focused only on the accuracy of the prediction. They have yet to consider the bandwidth required to access the value prediction tables. In this paper, we focus on the bandwidth issues of the value prediction. We propose augmenting the trace cache ,  (which was proposed to provide the required fetch bandwidth for wide-issue ILP processors) with a copy of the predicted values and moving the generation of those predicted values (which require accessing the value prediction tables) from the instruction fetch stage to a later stage, e.g., the writeback stage. Such a change will allow "selective value prediction," i.e., only those instructions which require value prediction will access the value prediction tables. It can significantly reduce the bandwidth requirement of value prediction tables. We also use a dynamic classification scheme to steer predictor updates to behavior-specific tables (such as last-value, stride, two-level, etc.). A relatively even split among such table accesses further moderates the bandwidth requirement of those tables.
Bibliographical noteFunding Information:
The authors wish to thank the anonymous reviewers for their detailed reviews and many constructive suggestions which have improved the paper significantly. The work was supported in part by the US National Science Foundation under Grants EIA-9971666 and MIP-9610379, by the Korea Science and Engineering Foundation under Grant R02-2000-00283, and a grant from Intel Corporation.
Copyright 2012 Elsevier B.V., All rights reserved.
- Data dependences
- Dynamic classification
- Instruction Level Parallelism
- Trace cache
- Value prediction