Cluster analysis using multivariate normal mixture models to detect differential gene expression with microarray data

Research output: Contribution to journalArticlepeer-review

28 Scopus citations


DNA microarrays make it possible to study simultaneously the expression of thousands of genes in a biological sample. Univariate clustering techniques have been used to discover target genes with differential expression between two experimental conditions. Because of possible loss of information due to use of univariate summary statistics, it may be more effective to use multivariate statistics. We present multivariate normal mixture model based clustering analyses to detect differential gene expression between two conditions. Deviating from the general mixture model and model-based clustering, we propose mixture models with specific mean and covariance structures that account for special features of two-condition microarray experiments. Explicit updating formulas in the EM algorithm for three such models are derived. The methods are applied to a real dataset to compare the expression levels of 1176 genes of rats with and without pneumococcal middle-ear infection to illustrate the performance and usefulness of this approach. About 10 genes and 20 genes are found to be differentially expressed in a six-dimensional modeling and a bivariate modeling, respectively. Two simulation studies are conducted to compare the performance of univariate and multivariate methods. Depending on data, neither method can always dominate the other. The results suggest that multivariate normal mixture models can be useful alternatives to univariate methods to detect differential gene expression in exploratory data analysis.

Original languageEnglish (US)
Pages (from-to)641-658
Number of pages18
JournalComputational Statistics and Data Analysis
Issue number2
StatePublished - Nov 15 2006


  • BIC
  • Clustering
  • EM algorithm
  • Model-based clustering
  • Multivariate statistics
  • Test statistics

Fingerprint Dive into the research topics of 'Cluster analysis using multivariate normal mixture models to detect differential gene expression with microarray data'. Together they form a unique fingerprint.

Cite this