Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data

Desheng Huang, Wei Pan

Research output: Contribution to journalArticlepeer-review

96 Scopus citations


Motivation: Because co-expressed genes are likely to share the same biological function, cluster analysis of gene expression profiles has been applied for gene function discovery. Most existing clustering methods ignore known gene functions in the process of clustering. Results: To take advantage of accumulating gene functional annotations, we propose incorporating known gene functions into a new distance metric, which shrinks a gene expression-based distance towards 0 if and only if the two genes share a common gene function. A two-step procedure is used. First, the shrinkage distance metric is used in any distance-based clustering method, e.g. K-medoids or hierarchical clustering, to cluster the genes with known functions. Second, while keeping the clustering results from the first step for the genes with known functions, the expression-based distance metric is used to cluster the remaining genes of unknown function, assigning each of them to either one of the clusters obtained in the first step or some new clusters. A simulation study and an application to gene function prediction for the yeast demonstrate the advantage of our proposal over the standard method.

Original languageEnglish (US)
Pages (from-to)1259-1268
Number of pages10
Issue number10
StatePublished - May 15 2006

Bibliographical note

Funding Information:
The authors are grateful to the reviewers for stimulating and constructive comments. D.H. was supported by National Natural Science Foundation of China grant No. 70503028 and a CMU grant, W.P. by NIH grant HL65462 and a UM AHC Development grant.


Dive into the research topics of 'Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data'. Together they form a unique fingerprint.

Cite this