Abstract
Speculative execution has long been used as an approach to exploit instruction level parallelism across basic block boundaries. Most existing speculative execution techniques only support speculating along a single control path, and heavily rely on branch prediction to choose the right control path. In this paper, we review the existing approaches for speculative execution and propose an extended predicated execution mechanism, called predicate shifting, to support speculating along multiple control paths. The predicate shifting mechanism maintains a condition/predicate window for each basic block. With the condition/predicate window, instructions can be guarded by predicates related to current or future branch conditions. The predicate shifting mechanism can reduce the number of required tag bits by shifting conditions/predicates out of the condition/predicate window whenever they are no longer in use. To incorporate the predicate shifting mechanism into a VLIW processor, a new result-buffering structure, called future buffer, is used to buffer uncommitted results and to evaluate predicates. The FIFO structure of the future buffer not only simplifies exception handling but also allows multiple uncommitted writes to the same register. Experimental results show that the predicate shifting mechanism can use predicate tag effectively and achieve 24% performance improvement over the previous predicating mechanism (H. Ando, C. Nakanishi, T. Hara, M. Nakaya, Unconstrained speculative execution with predicated state buffering, in: Proceedings of the 22nd International Symposium on Computer Architecture, 1995, pp. 126-137) using a small predicate tag.
Original language | English (US) |
---|---|
Pages (from-to) | 1075-1095 |
Number of pages | 21 |
Journal | Journal of Systems Architecture |
Volume | 45 |
Issue number | 12-13 |
DOIs | |
State | Published - Jun 1999 |
Bibliographical note
Funding Information:This paper presents a control dependency encoding and manipulating mechanism, called predicate shifting, to effectively support both predicated and speculative execution. The predicate shifting mechanism provides the compiler and the processor a cost-effective way to specify and to store the control dependencies of an instruction. The key idea is using a shifting condition/predicate window to specify the scope of branch. With the support of predicate shifting mechanism, the compiler can fully predicate basic blocks or speculatively move instructions from multiple control paths above the conditional branches they are dependent on. Unlike the previous predicating mechanism [2] , this mechanism will not limit the number of conditions in a region and thus can achieve good performance with a small predicate tag. The simulation results show that the predicate shifting model can achieve 16% performance improvement over the predicating model when using a 10-bit predicate tag, and achieve 24% performance improvement when using a 4-bit predicate tag. The experimental results also shows that, with the support for multiple-path speculative execution, the compiler can effectively exploit ILP across basic block boundaries without relying on profile information. This is a significant improvement over other schemes [5, 3, 17] , which can only speculate along one selected control path. This paper also presents a structure, called future buffer, to buffer uncommitted results and to evaluate the predicate state associated with each result. The FIFO nature of the future buffer can simplify exception handling and allow multiple uncommitted writes to the same register. To avoid complex hardware for associative lookup, we introduce an offset reference mechanism to access uncommitted results in the future buffer. The experimental results show that a future buffer of 16 entries is able to provide sufficient buffering space for a 4-issue processor. One of the possible drawbacks of the predicate shifting mechanism is the code expansion caused by tail duplication. We solve this problem by limiting the level of conditional branches in a region and by using full predication to merge control paths. With these methods, we can reduce the code expansion rate to a reasonable factor without harming the performance. Jenn-Yuan Tsai is a software engineer at Performance Delivery Laboratory of Hewlett-Packard Company in Cupertino, California, USA. He received a BS degree in computer engineering from National Chiao Tung University in 1987, an MS degree in electrical engineering from National Taiwan University in 1989, and a PhD degree in computer science from the University of Illinois at Urbana-Champaign in 1998. His research interests include microprocessor architecture, parallel systems, and optimizing compilers. Pen-Chung Yew received his PhD in computer science from the University of Illinois at Urbana-Champaign in 1981. He has been a a full professor in the Dept. of Computer Science, University of Minnesota since 1994. Previously, he was an associate director of the Center for Supercomputing Research and Development at the University of Illinois. From 1991 to 1992, he served as the program director of the Microelectronic Systems Architecture Program in the Division of Microelectronic Information Processing Systems at the National Science Foundation, Washington, D.C. Pen-Chung Yew is an IEEE Fellow, and has served on the program committee of various conferences. He also served as a co-chairman of the 1990 International Conference on Parallel Processing, a general co-chairman of the 1994 International Symposium on Computer Architecture, and the program chair of the 1996 International Conference on Supercomputing. He served on the editorial boards of the IEEE Transactions on Parallel and Distributed Systems from 1992 to 1996, and Journal of Parallel and Distributed Computing from 1989 to 1995. He was a distinguished visitor of the IEEE Computer Society from 1990 to 1993. His research interests include high-performance multiprocessor system design, parallelizing compilers, computer architecture, and performance evaluation.
Keywords
- Instruction level parallelism
- Predicated execution
- Speculative execution
- VLIW processor architecture