Discovery of error-tolerant biclusters from noisy gene expression data

Rohit Gupta, Navneet Rao, Vipin Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

An important analysis performed on microarray gene-expression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these real-valued gene-expression data sets. However, these algorithms suffer from several limitations such as inability to explicitly handle errors/noise in the data; difficulty in discovering small bicliusters due to their top-down approach; inability of some of the approaches to find overlapping biclusters, which is crucial as many genes participate in multiple biological processes. Association pattern mining also produce biclusters as their result and can naturally address some of these limitations. However, traditional association mining only finds exact biclusters, which limits its applicability in real-life data sets where the biclusters may be fragmented due to random noise/errors. Moreover, as they only work with binary or boolean attributes, their application on gene-expression data require transforming real-valued attributes to binary attributes, which often results in loss of information. Many past approaches have tried to address the issue of noise and handling realvalued attributes independently but there is no systematic approach that addresses both of these issues together. In this paper, we first propose a novel error-tolerant biclustering model, 'ET-bicluster', and then propose a bottomup heuristic-based mining algorithm to sequentially discover error-tolerant biclusters directly from real-valued gene-expression data. The efficacy of our proposed approach is illustrated in the context of two biological problems: discovery of functional modules and discovery of biomarkers. For the first problem, we used two real-valued S.Cerevisiae microarray gene-expression data sets and evaluate the biclusters obtained in terms of their functional coherence as evaluated using the GO-based functional enrichment analysis. The statistical significance of the discovered error-tolerant biclusters as estimated by using two randomization tests, reveal that they are indeed biologically meaningful and statistically significant. For the second problem of biomarker discovery, we used four real-valued Breast Cancer microarray gene-expression data sets and evaluate the biomarkers obtained using MSigDB gene sets. We compare our results obtained from both the problems, with a recent approach RAP and clearly demonstrate the importance of incorporating noise/errors in discovering coherent groups of genes from gene-expression data.

Original languageEnglish (US)
Title of host publicationBMC Bioinformatics
PublisherAssociation for Computing Machinery
Pages5-14
Number of pages10
ISBN (Electronic)9781605583020
StatePublished - Jan 1 2010
Event9th International Workshop on Data Mining in Bioinformatics, BIOKDD 2010, Held in Conjunction with 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining - Washington, United States
Duration: Jul 25 2010Jul 28 2010

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other9th International Workshop on Data Mining in Bioinformatics, BIOKDD 2010, Held in Conjunction with 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
CountryUnited States
CityWashington
Period7/25/107/28/10

Fingerprint Dive into the research topics of 'Discovery of error-tolerant biclusters from noisy gene expression data'. Together they form a unique fingerprint.

Cite this