The problem of simulating the motion of a set of bodies arises in a variety of domains such as astrophysics, molecular dynamics, fluid dynamics, and high energy physics. The all-to-all nature of interaction between various bodies renders this problem extremely computation-intensive. Techniques based on hierarchical approximations have effectively reduced the complexity of this problem. Coupled with parallel processing, these techniques hold the promise of large scale n-body simulations. In this paper, we present a spectrum of parallel formulations that are suited for different particle distributions. We first present a parallel formulation that uses a static partitioning of the domain and assignment of subdomains to processors. We demonstrate that this scheme delivers acceptable load balance, and coupled with two collective communication operations, it yields good performance. We present a second parallel formulation that combines static decomposition of the domain with an assignment of subdomains to processors based on Morton ordering. This alleviates the load imbalance inherent in the first scheme. We generalize these schemes to dynamic domain decomposition coupled with a subtree assignment that tries to optimize locality of processor subdomains. Unlike existing schemes that are based on shipping data to processors needing them, our schemes are based on shipping computation to processors where data reside. We present an experimental evaluation of our schemes on a 256 processor nCUBE2 and a 256 processor CM5. The evaluation is based on an astrophysical simulation of a variety of Gaussian and Plummer distributions of varying irregularity. We study the impact of a variety of parameters such as the impact of multipole degree and the α-criterion on accuracy and parallel performance. We demonstrate that our parallel formulations yield excellent performance and scale up to a large number of processors, making it possible to run realistic simulations with millions of particles. Furthermore, we show that as the accuracy of simulations is increased by increasing multipole degree, our formulations yield improved efficiencies.
Bibliographical noteFunding Information:
This work is sponsored by the by Army Research Office contract DA/DAAH04-95-1-0538 and by Army High Performance Computing Research Center under the auspices of the Department of the Army, Army Research Laboratory cooperative agreement no. DAAH04-95-2-0003/contract no. DAAH04-95-C-0008, the content of which does not necessarily reflect the position or the policy of the government, and no official endorsement should be inferred. This work is also sponsored in part by MSI. Access to computing facilities was provided by Cray Research and by the Pittsburgh Supercomputing Center. Related papers are available via WWW at URL: http://www.cs.umn.edu/users/kumar/papers.html. The authors wish to acknowledge anonymous reviewers whose comments served to improve the quality of this manuscript.
Copyright 2018 Elsevier B.V., All rights reserved.
- Experimental results
- Hierarchical tree code
- Load balance
- n-Body simulation