TY - GEN
T1 - An association analysis approach to biclustering
AU - Pandey, Gaurav
AU - Atluri, Gowtham
AU - Steinbach, Michael
AU - Myers, Chad L.
AU - Kumar, Vipin
PY - 2009
Y1 - 2009
N2 - The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis performed on real-valued data sets in various domains, such as biology. Several algorithms have been proposed to find different types of biclusters in such data sets. However, these algorithms are unable to search the space of all possible biclusters exhaustively. Pattern mining algorithms in association analysis also essentially produce biclusters as their result, since the patterns consist of items that are supported by a subset of all the transactions. However, a major limitation of the numerous techniques developed in association analysis is that they are only able to analyze data sets with binary and/or categorical variables, and their application to real-valued data sets often involves some lossy transformation such as discretization or binarization of the attributes. In this paper, we propose a novel association analysis framework for exhaustively and efficiently mining "range support" patterns from such a data set. On one hand, this framework reduces the loss of information incurred by the binarization- and discretization-based approaches, and on the other, it enables the exhaustive discovery of coherent biclusters. We compared the performance of our framework with two standard biclustering algorithms through the evaluation of the similarity of the cellular functions of the genes constituting the patterns/biclusters derived by these algorithms from microarray data. These experiments show that the real-valued patterns discovered by our framework are better enriched by small biologically interesting functional classes. Also, through specific examples, we demonstrate the ability of the RAP framework to discover functionally enriched patterns that are not found by the commonly used biclustering algorithm ISA. The source code and data sets used in this paper, as well as the supplementary material, are available at http://www.cs.umn.edu/vk/ gaurav/rap.
AB - The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis performed on real-valued data sets in various domains, such as biology. Several algorithms have been proposed to find different types of biclusters in such data sets. However, these algorithms are unable to search the space of all possible biclusters exhaustively. Pattern mining algorithms in association analysis also essentially produce biclusters as their result, since the patterns consist of items that are supported by a subset of all the transactions. However, a major limitation of the numerous techniques developed in association analysis is that they are only able to analyze data sets with binary and/or categorical variables, and their application to real-valued data sets often involves some lossy transformation such as discretization or binarization of the attributes. In this paper, we propose a novel association analysis framework for exhaustively and efficiently mining "range support" patterns from such a data set. On one hand, this framework reduces the loss of information incurred by the binarization- and discretization-based approaches, and on the other, it enables the exhaustive discovery of coherent biclusters. We compared the performance of our framework with two standard biclustering algorithms through the evaluation of the similarity of the cellular functions of the genes constituting the patterns/biclusters derived by these algorithms from microarray data. These experiments show that the real-valued patterns discovered by our framework are better enriched by small biologically interesting functional classes. Also, through specific examples, we demonstrate the ability of the RAP framework to discover functionally enriched patterns that are not found by the commonly used biclustering algorithm ISA. The source code and data sets used in this paper, as well as the supplementary material, are available at http://www.cs.umn.edu/vk/ gaurav/rap.
KW - Association analysis
KW - Biclustering
KW - Functional modules
KW - Microarray data
KW - Range support
KW - Real-valued data
UR - http://www.scopus.com/inward/record.url?scp=71049189327&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=71049189327&partnerID=8YFLogxK
U2 - 10.1145/1557019.1557095
DO - 10.1145/1557019.1557095
M3 - Conference contribution
AN - SCOPUS:71049189327
SN - 9781605584959
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 677
EP - 685
BT - KDD '09
T2 - 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09
Y2 - 28 June 2009 through 1 July 2009
ER -