TY - JOUR
T1 - Parallel randomly compressed cubes
T2 - A scalable distributed architecture for big tensor decomposition
AU - Sidiropoulos, Nicholas D.
AU - Papalexakis, Evangelos E.
AU - Faloutsos, Christos
PY - 2014
Y1 - 2014
N2 - This article combines a tutorial on state-of-the-art tensor decomposition as it relates to big data analytics, with original research on parallel and distributed computation of low-rank decomposition for big tensors, and a concise primer on Hadoop?MapReduce. A novel architecture for parallel and distributed computation of low-rank tensor decomposition that is especially well suited for big tensors is proposed. The new architecture is based on parallel processing of a set of randomly compressed, reduced-size replicas of the big tensor. Each replica is independently decomposed, and the results are joined via a master linear equation per tensor mode. The approach enables massive parallelism with guaranteed identifiability properties: if the big tensor is of low rank and the system parameters are appropriately chosen, then the rank-one factors of the big tensor will indeed be recovered from the analysis of the reduced-size replicas. Furthermore, the architecture affords memory/storage and complexity gains of order for a big tensor of size of rank F with No sparsity is required in the tensor or the underlying latent factors, although such sparsity can be exploited to improve memory, storage, and computational savings.
AB - This article combines a tutorial on state-of-the-art tensor decomposition as it relates to big data analytics, with original research on parallel and distributed computation of low-rank decomposition for big tensors, and a concise primer on Hadoop?MapReduce. A novel architecture for parallel and distributed computation of low-rank tensor decomposition that is especially well suited for big tensors is proposed. The new architecture is based on parallel processing of a set of randomly compressed, reduced-size replicas of the big tensor. Each replica is independently decomposed, and the results are joined via a master linear equation per tensor mode. The approach enables massive parallelism with guaranteed identifiability properties: if the big tensor is of low rank and the system parameters are appropriately chosen, then the rank-one factors of the big tensor will indeed be recovered from the analysis of the reduced-size replicas. Furthermore, the architecture affords memory/storage and complexity gains of order for a big tensor of size of rank F with No sparsity is required in the tensor or the underlying latent factors, although such sparsity can be exploited to improve memory, storage, and computational savings.
UR - http://www.scopus.com/inward/record.url?scp=84906539154&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906539154&partnerID=8YFLogxK
U2 - 10.1109/MSP.2014.2329196
DO - 10.1109/MSP.2014.2329196
M3 - Article
SN - 1053-5888
VL - 31
SP - 57
EP - 70
JO - IEEE Signal Processing Magazine
JF - IEEE Signal Processing Magazine
IS - 5
M1 - 6879586
ER -