### Abstract

Triangle counting is a fundamental graph analytic operation that is used extensively in network science and graph mining. As the size of the graphs that needs to be analyzed continues to grow, there is a requirement in developing scalable algorithms for distributed-memory parallel systems. To this end, we present a distributed-memory triangle counting algorithm, which uses a 2D cyclic decomposition to balance the computations and reduce the communication overheads. The algorithm structures its communication and computational steps such that it reduces its memory overhead and includes key optimizations that leverage the sparsity of the graph and the way the computations are structured. Experiments on synthetic and real-world graphs show that our algorithm obtains an average relative speedup range between 3.24 to 7.22 out of 10.56 across the datasets using 169 MPI ranks over the performance achieved by 16 MPI ranks. Moreover, we obtain an average speedup of 10.2 times on comparison with previously developed distributed-memory parallel algorithms.

Original language | English (US) |
---|---|

Title of host publication | Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019 |

Publisher | Association for Computing Machinery |

ISBN (Electronic) | 9781450362955 |

DOIs | |

State | Published - Aug 5 2019 |

Event | 48th International Conference on Parallel Processing, ICPP 2019 - Kyoto, Japan Duration: Aug 5 2019 → Aug 8 2019 |

### Publication series

Name | ACM International Conference Proceeding Series |
---|

### Conference

Conference | 48th International Conference on Parallel Processing, ICPP 2019 |
---|---|

Country | Japan |

City | Kyoto |

Period | 8/5/19 → 8/8/19 |

### Fingerprint

### Keywords

- Distributed-memory
- Graph analytics
- Triangle counting

### Cite this

*Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019*[a45] (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3337821.3337853

**A 2D parallel triangle counting algorithm for distributed-memory architectures.** / Tom, Ancy Sarah; Karypis, George.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019.*, a45, ACM International Conference Proceeding Series, Association for Computing Machinery, 48th International Conference on Parallel Processing, ICPP 2019, Kyoto, Japan, 8/5/19. https://doi.org/10.1145/3337821.3337853

}

TY - GEN

T1 - A 2D parallel triangle counting algorithm for distributed-memory architectures

AU - Tom, Ancy Sarah

AU - Karypis, George

PY - 2019/8/5

Y1 - 2019/8/5

N2 - Triangle counting is a fundamental graph analytic operation that is used extensively in network science and graph mining. As the size of the graphs that needs to be analyzed continues to grow, there is a requirement in developing scalable algorithms for distributed-memory parallel systems. To this end, we present a distributed-memory triangle counting algorithm, which uses a 2D cyclic decomposition to balance the computations and reduce the communication overheads. The algorithm structures its communication and computational steps such that it reduces its memory overhead and includes key optimizations that leverage the sparsity of the graph and the way the computations are structured. Experiments on synthetic and real-world graphs show that our algorithm obtains an average relative speedup range between 3.24 to 7.22 out of 10.56 across the datasets using 169 MPI ranks over the performance achieved by 16 MPI ranks. Moreover, we obtain an average speedup of 10.2 times on comparison with previously developed distributed-memory parallel algorithms.

AB - Triangle counting is a fundamental graph analytic operation that is used extensively in network science and graph mining. As the size of the graphs that needs to be analyzed continues to grow, there is a requirement in developing scalable algorithms for distributed-memory parallel systems. To this end, we present a distributed-memory triangle counting algorithm, which uses a 2D cyclic decomposition to balance the computations and reduce the communication overheads. The algorithm structures its communication and computational steps such that it reduces its memory overhead and includes key optimizations that leverage the sparsity of the graph and the way the computations are structured. Experiments on synthetic and real-world graphs show that our algorithm obtains an average relative speedup range between 3.24 to 7.22 out of 10.56 across the datasets using 169 MPI ranks over the performance achieved by 16 MPI ranks. Moreover, we obtain an average speedup of 10.2 times on comparison with previously developed distributed-memory parallel algorithms.

KW - Distributed-memory

KW - Graph analytics

KW - Triangle counting

UR - http://www.scopus.com/inward/record.url?scp=85071096572&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071096572&partnerID=8YFLogxK

U2 - 10.1145/3337821.3337853

DO - 10.1145/3337821.3337853

M3 - Conference contribution

T3 - ACM International Conference Proceeding Series

BT - Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019

PB - Association for Computing Machinery

ER -