Mining strong affinity association patterns in data sets with skewed support distribution

Hui Xiong, Pang Ning Tan, Vipin Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

127 Scopus citations

Abstract

Existing association-rule mining algorithms often rely on the support-based pruning strategy to prune its combinatorial search space. This strategy is not quite effective for data sets with skewed support distributions because they tend to generate many spurious patterns involving items from different support levels or miss potentially interesting low-support patterns. To overcome these problems, we propose the concept of hyperclique pattern, which uses an objective measure called h-confidence to identify strong affinity patterns. We also introduce the novel concept of cross-support property for eliminating patterns involving items with substantially different support levels. Our experimental results demonstrate the effectiveness of this method for finding patterns in dense data sets even at very low support thresholds, where most of the existing algorithms would break down. Finally, hyperclique patterns also show great promise for clustering items in high dimensional space.

Original languageEnglish (US)
Title of host publicationProceedings - 3rd IEEE International Conference on Data Mining, ICDM 2003
Pages387-394
Number of pages8
StatePublished - 2003
Event3rd IEEE International Conference on Data Mining, ICDM '03 - Melbourne, FL, United States
Duration: Nov 19 2003Nov 22 2003

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other3rd IEEE International Conference on Data Mining, ICDM '03
Country/TerritoryUnited States
CityMelbourne, FL
Period11/19/0311/22/03

Fingerprint

Dive into the research topics of 'Mining strong affinity association patterns in data sets with skewed support distribution'. Together they form a unique fingerprint.

Cite this