From categorical to numerical: Multiple transitive distance learning and embedding

Kai Zhang, Qiaojun Wang, Zhengzhang Chen, Ivan Marsic, Vipin Kumar, Guofei Jiang, Jie Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

34 Scopus citations

Abstract

Categorical data are ubiquitous in real-world databases. However, due to the lack of an intrinsic proximity measure, many powerful algorithms for numerical data analysis may not work well on their categorical counterparts, making it a bottleneck in practical applications. In this paper, we propose a novel method to transform categorical data to numerical representations, so that abundant numerical learning methods can be exploited in categorical data mining. Our key idea is to learn a pairwise dissimilarity among categorical symbol-s, henceforth a continuous embedding, which can then be used for subsequent numerical treatment. There are two important criteria for learning the dissimilarities. First, it should capture the important "transitivity" which has shown to be particularly useful in measuring the proximity relation in categorical data. Second, the pairwise sample geometry arising from the learned symbol distances should be maximally consistent with prior knowledge (e.g., class labels) to obtain a good generalization performance. We achieve them through multiple transitive distance learning and embedding. Encouraging results are observed on a number of benchmark classification tasks against state-of-the-art.

Original languageEnglish (US)
Title of host publicationSIAM International Conference on Data Mining 2015, SDM 2015
EditorsSuresh Venkatasubramanian, Jieping Ye
PublisherSociety for Industrial and Applied Mathematics Publications
Pages46-54
Number of pages9
ISBN (Electronic)9781510811522
DOIs
StatePublished - 2015
EventSIAM International Conference on Data Mining 2015, SDM 2015 - Vancouver, Canada
Duration: Apr 30 2015May 2 2015

Publication series

NameSIAM International Conference on Data Mining 2015, SDM 2015

Other

OtherSIAM International Conference on Data Mining 2015, SDM 2015
Country/TerritoryCanada
CityVancouver
Period4/30/155/2/15

Bibliographical note

Publisher Copyright:
Copyright © SIAM.

Fingerprint

Dive into the research topics of 'From categorical to numerical: Multiple transitive distance learning and embedding'. Together they form a unique fingerprint.

Cite this