Online sketching of big categorical data with absent features

Yanning Shen, Morteza Mardani, Georgios B. Giannakis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

With the scale of data growing every day, reducing the dimensionality (a.k.a. sketching) of high-dimensional vectors has emerged as a task of increasing importance. Relevant issues to address in this context include the sheer volume of data vectors that may consist of categorical (meaning finite-alphabet) features, the typically streaming format of data acquisition, and the possibly absent features. To cope with these challenges, the present paper brings forth a novel rank-regularized maximum likelihood approach that models categorical data as quantized values of analog-amplitude features with low intrinsic dimensionality. This model along with recent online rank regularization advances are leveraged to sketch high-dimensional categorical data 'on the fly.' Simulated tests with synthetic as well as real-world datasets corroborate the merits of the novel scheme relative to state-of-the-art alternatives.

Original languageEnglish (US)
Title of host publication2015 49th Annual Conference on Information Sciences and Systems, CISS 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781479984282
DOIs
StatePublished - Apr 15 2015
Event2015 49th Annual Conference on Information Sciences and Systems, CISS 2015 - Baltimore, United States
Duration: Mar 18 2015Mar 20 2015

Publication series

Name2015 49th Annual Conference on Information Sciences and Systems, CISS 2015

Other

Other2015 49th Annual Conference on Information Sciences and Systems, CISS 2015
CountryUnited States
CityBaltimore
Period3/18/153/20/15

Keywords

  • Rank regularization
  • categorical data
  • online sketching

Fingerprint Dive into the research topics of 'Online sketching of big categorical data with absent features'. Together they form a unique fingerprint.

Cite this