Abstract
Modeling multi-way data can be accomplished using tensors, which are data structures indexed along three or more dimensions. Tensors are increasingly used to analyze extremely large and sparse multi-way datasets in life sciences, engineering, and business. The canonical polyadic decomposition (CPD) is a popular tensor factorization for discovering latent features and is most commonly found via the method of alternating least squares (CPD-ALS). The computational time and memory required to compute CPD limits the size and dimensionality of the tensors that can be solved on a typical workstation, making distributed solution approaches the only viable option. Most methods for distributed-memory systems have focused on distributing the tensor in a coarse-grained, one-dimensional fashion that prohibitively requires the dense matrix factors to be fully replicated on each node. Recent work overcomes this limitation by using a fine-grained decomposition of the tensor nonzeros, at the cost of computationally expensive hypergraph partitioning. To that effect, we present a medium-grained decomposition that avoids complete factor replication and communication, while eliminating the need for expensive pre-processing steps. We use a hybrid MPI+OpenMP implementation that exploits multi-core architectures with a low memory footprint. We theoretically analyze the scalability of the coarse-, medium-, and fine-grained decompositions and experimentally compare them across a variety of datasets. Experiments show that the medium-grained decomposition reduces communication volume by 36-90% compared to the coarse-grained decomposition, is 41-76x faster than a state-of-the-art MPI code, and is 1.5-5.0x faster than the fine-grained decomposition with 1024 cores.
| Original language | English (US) |
|---|---|
| Title of host publication | Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 902-911 |
| Number of pages | 10 |
| ISBN (Electronic) | 9781509021406 |
| DOIs | |
| State | Published - Jul 18 2016 |
| Event | 30th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016 - Chicago, United States Duration: May 23 2016 → May 27 2016 |
Publication series
| Name | Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016 |
|---|
Other
| Other | 30th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016 |
|---|---|
| Country/Territory | United States |
| City | Chicago |
| Period | 5/23/16 → 5/27/16 |
Bibliographical note
Publisher Copyright:© 2016 IEEE.
Keywords
- CPD
- Distributed
- Medium-grained
- PARAFAC
- Parallel
- Sparse tensor