Abstract
Existing algorithms for mining association patterns often rely on the support-based pruning strategy to prune a combinatorial search space. However, this strategy is not effective for discovering potentially interesting patterns at low levels of support. Also, it tends to generate too many spurious patterns involving items which are from different support levels and are poorly correlated. In this paper, we present a framework for mining highly-correlated association patterns called hyperclique patterns. In this framework, an objective measure called h-confidence is applied to discover hyperclique patterns. We prove that the items in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by the cosine similarity (uncentered Pearson's correlation coefficient). Also, we show that the h-confidence measure satisfies a cross-support property which can help efficiently eliminate spurious patterns involving items with substantially different support levels. Indeed, this cross-support property is not limited to h-confidence and can be generalized to some other association measures. In addition, an algorithm called hyperclique miner is proposed to exploit both cross-support and anti-monotone properties of the h-confidence measure for the efficient discovery of hyperclique patterns. Finally, our experimental results show that hyperclique miner can efficiently identify hyperclique patterns, even at extremely low levels of support.
Original language | English (US) |
---|---|
Pages (from-to) | 219-242 |
Number of pages | 24 |
Journal | Data Mining and Knowledge Discovery |
Volume | 13 |
Issue number | 2 |
DOIs | |
State | Published - Sep 2006 |
Bibliographical note
Funding Information:Acknowledgments This work was partially supported by NSF grant # IIS-0308264, DOE/LLNL W-7045-ENG-48, and by Army High Performance Computing Research Center under the auspices of the Department of the Army, Army Research Laboratory cooperative agreement number DAAD19-01-2-0014. The content of this work does not necessarily reflect the position or policy of the government and no official endorsement should be inferred. Access to computing facilities was provided by the AHPCRC and the Minnesota Supercomputing Institute. Finally, we would like to thank Dr. Mohammed J. Zaki for providing us the CHARM code. Also, we would like to thank Dr. Shashi Shekhar, Dr. Ke Wang, and Michael Steinbach for valuable comments.
Keywords
- Association analysis
- Hyperclique patterns
- Pattern Mining