The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor

David J. Lilja

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Trace-driven simulations of numerical Fortran programs are used to study the impact of the parallel loop scheduling strategy on data prefetching in a shared memory multiprocessor with private data caches. The simulations indicate that to maximize memory performance, it is important to schedule blocks of consecutive iterations to execute on each processor, and then to adaptively prefetch single-word cache blocks to match the number of iterations scheduled. Prefetching multiple single-word cache blocks on a miss reduces the miss ratio by approximately 5% to 30% compared to a system with no prefetching. In addition, the proposed adaptive prefetching scheme further reduces the miss ratio while significantly reducing the false sharing among cache blocks compared to nonadaptive prefetching strategies. Reducing the false sharing causes fewer coherence invalidations to be generated, and thereby reduces the total network traffic. The impact of the prefetching and scheduling strategies on the temporal distribution of coherence invalidations also is examined. It is found that invalidations tend to be evenly distributed throughout the execution of parallel loops, but tend to be clustered when executing sequential program sections. The distribution of invalidations in both types of program sections is relatively insensitive to the prefetching and scheduling strategy.

Original languageEnglish (US)
Pages (from-to)573-584
Number of pages12
JournalIEEE Transactions on Parallel and Distributed Systems
Volume5
Issue number6
DOIs
StatePublished - Jun 1994

Bibliographical note

Funding Information:
Manuscript received May 20, 1992; revised December 8, 1992, and May 5, 1993. This work. was supported in part by the National Science Foundation under Grants CCR-9209458 and MIP-9221900. A preliminary version of this work [ 161 was presented at the First Midwest Electrotechnology Conference in April 1992.

Keywords

  • Cache coherence
  • cache pollution
  • false sharing
  • guided self-scheduling
  • multiprocessor
  • prefetching
  • scheduling
  • shared memory

Fingerprint

Dive into the research topics of 'The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor'. Together they form a unique fingerprint.

Cite this