TY - JOUR
T1 - Scalable approach to Thread-Level Speculation
AU - Steffan, J. Gregory
AU - Colohan, Christopher B.
AU - Zhai, Antonia
AU - Mowry, Todd C.
PY - 2000
Y1 - 2000
N2 - While architects understand how to build cost-effective parallel machines across a wide spectrum of machine sizes (ranging from within a single chip to large-scale servers), the real challenge is how to easily create parallel software to effectively exploit all of this raw performance potential. One promising technique for overcoming this problem is Thread-Level Speculation (TLS), which enables the compiler to optimistically create parallel threads despite uncertainty as to whether those threads are actually independent. In this paper, we propose and evaluate a design for supporting TLS that seamlessly scales to any machine size because it is a straightforward extension of writeback invalidation-based cache coherence (which itself scales both up and down). Our experimental results demonstrate that our scheme performs well on both single-chip multiprocessors and on larger-scale machines where communication latencies are twenty times larger.
AB - While architects understand how to build cost-effective parallel machines across a wide spectrum of machine sizes (ranging from within a single chip to large-scale servers), the real challenge is how to easily create parallel software to effectively exploit all of this raw performance potential. One promising technique for overcoming this problem is Thread-Level Speculation (TLS), which enables the compiler to optimistically create parallel threads despite uncertainty as to whether those threads are actually independent. In this paper, we propose and evaluate a design for supporting TLS that seamlessly scales to any machine size because it is a straightforward extension of writeback invalidation-based cache coherence (which itself scales both up and down). Our experimental results demonstrate that our scheme performs well on both single-chip multiprocessors and on larger-scale machines where communication latencies are twenty times larger.
UR - http://www.scopus.com/inward/record.url?scp=0033703889&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0033703889&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:0033703889
SN - 0884-7495
SP - 1
EP - 12
JO - Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA
JF - Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA
T2 - ISCA-27: The 27th Annual International Symposium on Computer Architecture
Y2 - 10 June 2000 through 14 June 2000
ER -