A framework for exploring categorical data

Varun Chandola, Shyam Boriah, Vipin Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Scopus citations

Abstract

In this paper, we present a framework for categorical data analysis which allows such data sets to be explored using a rich set of techniques that are only applicable to continuous data sets. We introduce the concept of separability statistics in the context of exploratory categorical data analysis. We show how these statistics can be used as a way to map categorical data to continuous space given a labeled reference data set. This mapping enables visualization of categorical data using techniques that are applicable to continuous data. We show that in the transformed continuous space, the performance of the standard k-nn based outlier detection technique is comparable to the performance of the k-nn based outlier detection technique using the best of the similarity measures designed for categorical data. The proposed framework can also be used to devise similarity measures best suited for a particular type of data set.

Original languageEnglish (US)
Title of host publicationSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133
Pages184-195
Number of pages12
StatePublished - Dec 1 2009
Event9th SIAM International Conference on Data Mining 2009, SDM 2009 - Sparks, NV, United States
Duration: Apr 30 2009May 2 2009

Publication series

NameSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics
Volume1

Other

Other9th SIAM International Conference on Data Mining 2009, SDM 2009
Country/TerritoryUnited States
CitySparks, NV
Period4/30/095/2/09

Fingerprint

Dive into the research topics of 'A framework for exploring categorical data'. Together they form a unique fingerprint.

Cite this