With the scale of data growing every day, reducing the dimensionality (a.k.a. sketching) of high-dimensional vectors has emerged as a task of increasing importance. Relevant issues to address in this context include the sheer volume of data vectors that may consist of categorical (meaning finite-alphabet) features, the typically streaming format of data acquisition, and the possibly absent features. To cope with these challenges, the present paper brings forth a novel rank-regularized maximum likelihood approach that models categorical data as quantized values of analog-amplitude features with low intrinsic dimensionality. This model along with recent online rank regularization advances are leveraged to sketch high-dimensional categorical data 'on the fly.' Simulated tests with synthetic as well as real-world datasets corroborate the merits of the novel scheme relative to state-of-the-art alternatives.
|Original language||English (US)|
|Title of host publication||2015 49th Annual Conference on Information Sciences and Systems, CISS 2015|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|State||Published - Apr 15 2015|
|Event||2015 49th Annual Conference on Information Sciences and Systems, CISS 2015 - Baltimore, United States|
Duration: Mar 18 2015 → Mar 20 2015
|Name||2015 49th Annual Conference on Information Sciences and Systems, CISS 2015|
|Other||2015 49th Annual Conference on Information Sciences and Systems, CISS 2015|
|Period||3/18/15 → 3/20/15|
Bibliographical notePublisher Copyright:
© 2015 IEEE.
- Rank regularization
- categorical data
- online sketching