DNA microarrays make it possible to study simultaneously the expression of thousands of genes in a biological sample. Univariate clustering techniques have been used to discover target genes with differential expression between two experimental conditions. Because of possible loss of information due to use of univariate summary statistics, it may be more effective to use multivariate statistics. We present multivariate normal mixture model based clustering analyses to detect differential gene expression between two conditions. Deviating from the general mixture model and model-based clustering, we propose mixture models with specific mean and covariance structures that account for special features of two-condition microarray experiments. Explicit updating formulas in the EM algorithm for three such models are derived. The methods are applied to a real dataset to compare the expression levels of 1176 genes of rats with and without pneumococcal middle-ear infection to illustrate the performance and usefulness of this approach. About 10 genes and 20 genes are found to be differentially expressed in a six-dimensional modeling and a bivariate modeling, respectively. Two simulation studies are conducted to compare the performance of univariate and multivariate methods. Depending on data, neither method can always dominate the other. The results suggest that multivariate normal mixture models can be useful alternatives to univariate methods to detect differential gene expression in exploratory data analysis.
Bibliographical noteFunding Information:
W. P. was supported by NIH Grant HL65462. We thank the two reviewers for many helpful and constructive comments.
- EM algorithm
- Model-based clustering
- Multivariate statistics
- Test statistics