This article combines a tutorial on state-of-the-art tensor decomposition as it relates to big data analytics, with original research on parallel and distributed computation of low-rank decomposition for big tensors, and a concise primer on Hadoop?MapReduce. A novel architecture for parallel and distributed computation of low-rank tensor decomposition that is especially well suited for big tensors is proposed. The new architecture is based on parallel processing of a set of randomly compressed, reduced-size replicas of the big tensor. Each replica is independently decomposed, and the results are joined via a master linear equation per tensor mode. The approach enables massive parallelism with guaranteed identifiability properties: if the big tensor is of low rank and the system parameters are appropriately chosen, then the rank-one factors of the big tensor will indeed be recovered from the analysis of the reduced-size replicas. Furthermore, the architecture affords memory/storage and complexity gains of order for a big tensor of size of rank F with No sparsity is required in the tensor or the underlying latent factors, although such sparsity can be exploited to improve memory, storage, and computational savings.