TY - GEN
T1 - Mining strong affinity association patterns in data sets with skewed support distribution
AU - Xiong, Hui
AU - Tan, Pang Ning
AU - Kumar, Vipin
PY - 2003
Y1 - 2003
N2 - Existing association-rule mining algorithms often rely on the support-based pruning strategy to prune its combinatorial search space. This strategy is not quite effective for data sets with skewed support distributions because they tend to generate many spurious patterns involving items from different support levels or miss potentially interesting low-support patterns. To overcome these problems, we propose the concept of hyperclique pattern, which uses an objective measure called h-confidence to identify strong affinity patterns. We also introduce the novel concept of cross-support property for eliminating patterns involving items with substantially different support levels. Our experimental results demonstrate the effectiveness of this method for finding patterns in dense data sets even at very low support thresholds, where most of the existing algorithms would break down. Finally, hyperclique patterns also show great promise for clustering items in high dimensional space.
AB - Existing association-rule mining algorithms often rely on the support-based pruning strategy to prune its combinatorial search space. This strategy is not quite effective for data sets with skewed support distributions because they tend to generate many spurious patterns involving items from different support levels or miss potentially interesting low-support patterns. To overcome these problems, we propose the concept of hyperclique pattern, which uses an objective measure called h-confidence to identify strong affinity patterns. We also introduce the novel concept of cross-support property for eliminating patterns involving items with substantially different support levels. Our experimental results demonstrate the effectiveness of this method for finding patterns in dense data sets even at very low support thresholds, where most of the existing algorithms would break down. Finally, hyperclique patterns also show great promise for clustering items in high dimensional space.
UR - http://www.scopus.com/inward/record.url?scp=12244259488&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=12244259488&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:12244259488
SN - 0769519784
SN - 9780769519784
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 387
EP - 394
BT - Proceedings - 3rd IEEE International Conference on Data Mining, ICDM 2003
T2 - 3rd IEEE International Conference on Data Mining, ICDM '03
Y2 - 19 November 2003 through 22 November 2003
ER -