TY - GEN
T1 - Online sketching of big categorical data with absent features
AU - Shen, Yanning
AU - Mardani, Morteza
AU - Giannakis, Georgios B.
N1 - Publisher Copyright:
© 2015 IEEE.
Copyright:
Copyright 2015 Elsevier B.V., All rights reserved.
PY - 2015/4/15
Y1 - 2015/4/15
N2 - With the scale of data growing every day, reducing the dimensionality (a.k.a. sketching) of high-dimensional vectors has emerged as a task of increasing importance. Relevant issues to address in this context include the sheer volume of data vectors that may consist of categorical (meaning finite-alphabet) features, the typically streaming format of data acquisition, and the possibly absent features. To cope with these challenges, the present paper brings forth a novel rank-regularized maximum likelihood approach that models categorical data as quantized values of analog-amplitude features with low intrinsic dimensionality. This model along with recent online rank regularization advances are leveraged to sketch high-dimensional categorical data 'on the fly.' Simulated tests with synthetic as well as real-world datasets corroborate the merits of the novel scheme relative to state-of-the-art alternatives.
AB - With the scale of data growing every day, reducing the dimensionality (a.k.a. sketching) of high-dimensional vectors has emerged as a task of increasing importance. Relevant issues to address in this context include the sheer volume of data vectors that may consist of categorical (meaning finite-alphabet) features, the typically streaming format of data acquisition, and the possibly absent features. To cope with these challenges, the present paper brings forth a novel rank-regularized maximum likelihood approach that models categorical data as quantized values of analog-amplitude features with low intrinsic dimensionality. This model along with recent online rank regularization advances are leveraged to sketch high-dimensional categorical data 'on the fly.' Simulated tests with synthetic as well as real-world datasets corroborate the merits of the novel scheme relative to state-of-the-art alternatives.
KW - Rank regularization
KW - categorical data
KW - online sketching
UR - http://www.scopus.com/inward/record.url?scp=84929190843&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84929190843&partnerID=8YFLogxK
U2 - 10.1109/CISS.2015.7086875
DO - 10.1109/CISS.2015.7086875
M3 - Conference contribution
AN - SCOPUS:84929190843
T3 - 2015 49th Annual Conference on Information Sciences and Systems, CISS 2015
BT - 2015 49th Annual Conference on Information Sciences and Systems, CISS 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2015 49th Annual Conference on Information Sciences and Systems, CISS 2015
Y2 - 18 March 2015 through 20 March 2015
ER -