TY - JOUR
T1 - Identifying differentially expressed genes in microarray experiments with model-based variance estimation
AU - Cai, Xiaodong
AU - Giannakis, Georgios B.
PY - 2006/6
Y1 - 2006/6
N2 - Statistical tests have been employed to identify genes differentially expressed under different conditions using data from microarray experiments. The variance of gene expression levels is often required in various statistical tests; however, due to the small number of replicates, the variance estimated from the sample variance is not accurate, which causes large false positive and negative errors. More accurate and robust variance estimation is thus highly desirable to improve the performance of statistical tests. In this paper, cluster analysis was performed on the microarray data using a model-based clustering method. The variance for each gene was then estimated from cluster variances. Since cluster variances are estimated from multiple genes whose microarray data have similar variance, the proposed estimation method pools the relevant genes together; this effectively increases the number of samples in variance estimation, thereby improving variance estimation. Using simulated data, it is shown that with the novel variance estimation, the performance of the t-test, regularized t-test, and a variant of SAM test, which is called the t-test here, can be improved. Using colon microarray data of Alon, it is demonstrated that the proposed method offers better or comparable performance compared with other gene pooling methods. Using the IHF microarray data of Arfin, it is shown that the proposed novel variance estimation decreases the significance of those genes having a small fold change but a high significant score assigned by the t-test using the sample variance, which potentially reduces false positive probability.
AB - Statistical tests have been employed to identify genes differentially expressed under different conditions using data from microarray experiments. The variance of gene expression levels is often required in various statistical tests; however, due to the small number of replicates, the variance estimated from the sample variance is not accurate, which causes large false positive and negative errors. More accurate and robust variance estimation is thus highly desirable to improve the performance of statistical tests. In this paper, cluster analysis was performed on the microarray data using a model-based clustering method. The variance for each gene was then estimated from cluster variances. Since cluster variances are estimated from multiple genes whose microarray data have similar variance, the proposed estimation method pools the relevant genes together; this effectively increases the number of samples in variance estimation, thereby improving variance estimation. Using simulated data, it is shown that with the novel variance estimation, the performance of the t-test, regularized t-test, and a variant of SAM test, which is called the t-test here, can be improved. Using colon microarray data of Alon, it is demonstrated that the proposed method offers better or comparable performance compared with other gene pooling methods. Using the IHF microarray data of Arfin, it is shown that the proposed novel variance estimation decreases the significance of those genes having a small fold change but a high significant score assigned by the t-test using the sample variance, which potentially reduces false positive probability.
KW - Clustering
KW - Microarray
KW - Mixture model
KW - Statistical test
KW - Variance estimation
UR - http://www.scopus.com/inward/record.url?scp=33744478402&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33744478402&partnerID=8YFLogxK
U2 - 10.1109/TSP.2006.873733
DO - 10.1109/TSP.2006.873733
M3 - Article
AN - SCOPUS:33744478402
VL - 54
SP - 2418
EP - 2426
JO - IRE Transactions on Audio
JF - IRE Transactions on Audio
SN - 1053-587X
IS - 6 II
ER -