TY - JOUR
T1 - Cluster analysis using multivariate normal mixture models to detect differential gene expression with microarray data
AU - He, Yi
AU - Pan, Wei
AU - Lin, Jizhen
PY - 2006/11/15
Y1 - 2006/11/15
N2 - DNA microarrays make it possible to study simultaneously the expression of thousands of genes in a biological sample. Univariate clustering techniques have been used to discover target genes with differential expression between two experimental conditions. Because of possible loss of information due to use of univariate summary statistics, it may be more effective to use multivariate statistics. We present multivariate normal mixture model based clustering analyses to detect differential gene expression between two conditions. Deviating from the general mixture model and model-based clustering, we propose mixture models with specific mean and covariance structures that account for special features of two-condition microarray experiments. Explicit updating formulas in the EM algorithm for three such models are derived. The methods are applied to a real dataset to compare the expression levels of 1176 genes of rats with and without pneumococcal middle-ear infection to illustrate the performance and usefulness of this approach. About 10 genes and 20 genes are found to be differentially expressed in a six-dimensional modeling and a bivariate modeling, respectively. Two simulation studies are conducted to compare the performance of univariate and multivariate methods. Depending on data, neither method can always dominate the other. The results suggest that multivariate normal mixture models can be useful alternatives to univariate methods to detect differential gene expression in exploratory data analysis.
AB - DNA microarrays make it possible to study simultaneously the expression of thousands of genes in a biological sample. Univariate clustering techniques have been used to discover target genes with differential expression between two experimental conditions. Because of possible loss of information due to use of univariate summary statistics, it may be more effective to use multivariate statistics. We present multivariate normal mixture model based clustering analyses to detect differential gene expression between two conditions. Deviating from the general mixture model and model-based clustering, we propose mixture models with specific mean and covariance structures that account for special features of two-condition microarray experiments. Explicit updating formulas in the EM algorithm for three such models are derived. The methods are applied to a real dataset to compare the expression levels of 1176 genes of rats with and without pneumococcal middle-ear infection to illustrate the performance and usefulness of this approach. About 10 genes and 20 genes are found to be differentially expressed in a six-dimensional modeling and a bivariate modeling, respectively. Two simulation studies are conducted to compare the performance of univariate and multivariate methods. Depending on data, neither method can always dominate the other. The results suggest that multivariate normal mixture models can be useful alternatives to univariate methods to detect differential gene expression in exploratory data analysis.
KW - BIC
KW - Clustering
KW - EM algorithm
KW - Model-based clustering
KW - Multivariate statistics
KW - Test statistics
UR - http://www.scopus.com/inward/record.url?scp=33750336303&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33750336303&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2006.02.012
DO - 10.1016/j.csda.2006.02.012
M3 - Article
AN - SCOPUS:33750336303
VL - 51
SP - 641
EP - 658
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
SN - 0167-9473
IS - 2
ER -