Low-overhead, high-speed multi-core barrier synchronization

John Sartori, Rakesh Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Citations (Scopus)

Abstract

Whereas efficient barrier implementations were once a concern only in high-performance computing, recent trends in core integration make the topic relevant even for general-purpose CMPs. While the nature of CMP applications requires low-latency, the cost of low-latency barrier implementations using hardware-based techniques can be prohibitive for CMPs, where die area represents opportunities for throughput and yield. Similarly, whereas traditional multiprocessor barrier implementations were developed primarily for dedicated environments, scheduling and multi-programming on CMPs require more adaptable barrier implementations. In this paper, we present and evaluate three barrier implementations that are hybrids of software and dedicated hardware barriers and are specifically tailored for CMPs. The implementations leverage the unique characteristics of CMPs and provide low latency comparable to that of dedicated hardware networks at a fraction of the cost. The implementations also support adaptability, enabling efficient multi-programming and dynamic remapping of the barrier network.

Original languageEnglish (US)
Title of host publicationHigh Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings
Pages18-34
Number of pages17
DOIs
StatePublished - Mar 25 2010
Event5th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC 2010 - Pisa, Italy
Duration: Jan 25 2010Jan 27 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5952 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other5th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC 2010
CountryItaly
CityPisa
Period1/25/101/27/10

Fingerprint

Synchronization
High Speed
Hardware
Latency
Costs
Scheduling
Throughput
Programming
Hardware Implementation
Multiprocessor
Adaptability
Leverage
Die
High Performance
Software
Computing
Evaluate

Cite this

Sartori, J., & Kumar, R. (2010). Low-overhead, high-speed multi-core barrier synchronization. In High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings (pp. 18-34). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5952 LNCS). https://doi.org/10.1007/978-3-642-11515-8_4

Low-overhead, high-speed multi-core barrier synchronization. / Sartori, John; Kumar, Rakesh.

High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings. 2010. p. 18-34 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5952 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sartori, J & Kumar, R 2010, Low-overhead, high-speed multi-core barrier synchronization. in High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5952 LNCS, pp. 18-34, 5th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC 2010, Pisa, Italy, 1/25/10. https://doi.org/10.1007/978-3-642-11515-8_4
Sartori J, Kumar R. Low-overhead, high-speed multi-core barrier synchronization. In High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings. 2010. p. 18-34. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-11515-8_4
Sartori, John ; Kumar, Rakesh. / Low-overhead, high-speed multi-core barrier synchronization. High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings. 2010. pp. 18-34 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{159d155a28cb4631abb21217bc308ef3,
title = "Low-overhead, high-speed multi-core barrier synchronization",
abstract = "Whereas efficient barrier implementations were once a concern only in high-performance computing, recent trends in core integration make the topic relevant even for general-purpose CMPs. While the nature of CMP applications requires low-latency, the cost of low-latency barrier implementations using hardware-based techniques can be prohibitive for CMPs, where die area represents opportunities for throughput and yield. Similarly, whereas traditional multiprocessor barrier implementations were developed primarily for dedicated environments, scheduling and multi-programming on CMPs require more adaptable barrier implementations. In this paper, we present and evaluate three barrier implementations that are hybrids of software and dedicated hardware barriers and are specifically tailored for CMPs. The implementations leverage the unique characteristics of CMPs and provide low latency comparable to that of dedicated hardware networks at a fraction of the cost. The implementations also support adaptability, enabling efficient multi-programming and dynamic remapping of the barrier network.",
author = "John Sartori and Rakesh Kumar",
year = "2010",
month = "3",
day = "25",
doi = "10.1007/978-3-642-11515-8_4",
language = "English (US)",
isbn = "3642115144",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "18--34",
booktitle = "High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings",

}

TY - GEN

T1 - Low-overhead, high-speed multi-core barrier synchronization

AU - Sartori, John

AU - Kumar, Rakesh

PY - 2010/3/25

Y1 - 2010/3/25

N2 - Whereas efficient barrier implementations were once a concern only in high-performance computing, recent trends in core integration make the topic relevant even for general-purpose CMPs. While the nature of CMP applications requires low-latency, the cost of low-latency barrier implementations using hardware-based techniques can be prohibitive for CMPs, where die area represents opportunities for throughput and yield. Similarly, whereas traditional multiprocessor barrier implementations were developed primarily for dedicated environments, scheduling and multi-programming on CMPs require more adaptable barrier implementations. In this paper, we present and evaluate three barrier implementations that are hybrids of software and dedicated hardware barriers and are specifically tailored for CMPs. The implementations leverage the unique characteristics of CMPs and provide low latency comparable to that of dedicated hardware networks at a fraction of the cost. The implementations also support adaptability, enabling efficient multi-programming and dynamic remapping of the barrier network.

AB - Whereas efficient barrier implementations were once a concern only in high-performance computing, recent trends in core integration make the topic relevant even for general-purpose CMPs. While the nature of CMP applications requires low-latency, the cost of low-latency barrier implementations using hardware-based techniques can be prohibitive for CMPs, where die area represents opportunities for throughput and yield. Similarly, whereas traditional multiprocessor barrier implementations were developed primarily for dedicated environments, scheduling and multi-programming on CMPs require more adaptable barrier implementations. In this paper, we present and evaluate three barrier implementations that are hybrids of software and dedicated hardware barriers and are specifically tailored for CMPs. The implementations leverage the unique characteristics of CMPs and provide low latency comparable to that of dedicated hardware networks at a fraction of the cost. The implementations also support adaptability, enabling efficient multi-programming and dynamic remapping of the barrier network.

UR - http://www.scopus.com/inward/record.url?scp=77949600101&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77949600101&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-11515-8_4

DO - 10.1007/978-3-642-11515-8_4

M3 - Conference contribution

SN - 3642115144

SN - 9783642115141

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 18

EP - 34

BT - High Performance Embedded Architectures and Compilers - 5th International Conference, HiPEAC 2010, Proceedings

ER -