Cache coherence enforcement and memory latency reduction and hiding are very important and challenging problems in the design of large-scale distributed shared-memory (DSM) multiprocessors. We pro- pose an integrated approach to solve these problems through a compiler- directed cache coherence scheme called the Cache Coherence with Data Prefetching (CCDP) scheme. The CCDP scheme enforces cache coher- ence by prefetching the potentially-stale references in a parallel program. It also prefetches the non-stale references to hide their memory latencies. To optimize the performance of the CCDP scheme, some prefetch hard- ware support is provided to eficiently handle these two forms of data prefetching operations. We also developed the compiler techniques uti- lized by the CCDP scheme for stale reference detection, prefetch target analysis and prefetch scheduling. We evaluated the performance of the CCDP scheme via execution-driven simulations of several applications from the SPEC CFP95 and CFP92 benchmark suites. The simulation results show that the CCDP scheme provides significant performance improvements for the benchmark programs studied.