Streaming tensor factorization for infinite data sources

Shaden Smith, Kejun Huang, Nicholas D. Sidiropoulos, George Karypis

Research output: Contribution to conferencePaper

3 Scopus citations

Abstract

Sparse tensor factorization is a popular tool in multi-way data analysis and is used in applications such as cybersecurity, recommender systems, and social network analysis. In many of these applications, the tensor is not known a priori and instead arrives in a streaming fashion for a potentially unbounded amount of time. Existing approaches for streaming sparse tensors are not practical for unbounded streaming because they rely on maintaining the full factorization of the data, which grows linearly with time. In this work, we present CP-stream, an algorithm for streaming factorization in the model of the canonical polyadic decomposition which does not grow linearly in time or space, and is thus practical for long-term streaming. Additionally, CP-stream incorporates user-specified constraints such as non-negativity which aid in the stability and interpretability of the factorization. An evaluation of CP-stream demonstrates that it converges faster than state-of-the-art streaming algorithms while achieving lower reconstruction error by an order of magnitude. We also evaluate it on real-world sparse datasets and demonstrate its usability in both network traffic analysis and discussion tracking. Our evaluation uses exclusively public datasets and our source code is released to the public as part of SPLATT, an open source high-performance tensor factorization toolkit.

Original languageEnglish (US)
Pages81-89
Number of pages9
DOIs
StatePublished - Jan 1 2018
Event2018 SIAM International Conference on Data Mining, SDM 2018 - San Diego, United States
Duration: May 3 2018May 5 2018

Other

Other2018 SIAM International Conference on Data Mining, SDM 2018
CountryUnited States
CitySan Diego
Period5/3/185/5/18

Fingerprint Dive into the research topics of 'Streaming tensor factorization for infinite data sources'. Together they form a unique fingerprint.

  • Cite this

    Smith, S., Huang, K., Sidiropoulos, N. D., & Karypis, G. (2018). Streaming tensor factorization for infinite data sources. 81-89. Paper presented at 2018 SIAM International Conference on Data Mining, SDM 2018, San Diego, United States. https://doi.org/10.1137/1.9781611975321.10