## Abstract

The Canonical Polyadic Decomposition (CPD) of tensors is a powerful tool for analyzing multi-way data and is used extensively to analyze very large and extremely sparse datasets. The bottleneck of computing the CPD is multiplying a sparse tensor by several dense matrices. Algorithms for tensormatrix products fall into two classes. The first class saves oating point operations by storing a compressed tensor for each dimension of the data. These methods are fast but super high memory costs. The second class uses a single uncompressed tensor at the cost of additional oating point operations. In this work, we bridge the gap between the two approaches and introduce the compressed sparse fiber (CSF) a data structure for sparse tensors along with a novel parallel algorithm for tensor-matrix multiplication. CSF offers similar operation reductions as existing compressed methods while using only a single tensor structure. We validate our contributions with experiments comparing against state-ofthe- art methods on a diverse set of datasets. Our work uses 58% less memory than the state-of-the-art while achieving 81% of the parallel performance on 16 threads.

Original language | English (US) |
---|---|

Title of host publication | Proceedings of the 5th Workshop on Irregular Applications |

Subtitle of host publication | Architectures and Algorithms, IA3 2015 |

Publisher | Association for Computing Machinery, Inc |

ISBN (Electronic) | 9781450340014 |

DOIs | |

State | Published - Nov 15 2015 |

Event | 5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2015 - Austin, United States Duration: Nov 15 2015 → … |

### Publication series

Name | Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2015 |
---|

### Other

Other | 5th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2015 |
---|---|

Country/Territory | United States |

City | Austin |

Period | 11/15/15 → … |

### Bibliographical note

Funding Information:This work was supported in part by NSF (IIS-0905220, OCI- 1048018, CNS-1162405, IIS-1247632, IIP-1414153, IIS-1447788), Army Research Office (W911NF-14-1-0316), Intel Software and Services Group, and the Digital Technology Center at the University of Minnesota. Access to research and computing facilities was provided by the Digital Technology Center and the Minnesota Supercomputing Institute.