TY - GEN
T1 - Graphical-model based multiple testing under dependence, with applications to genome-wide association studies
AU - Liu, Jie
AU - Peissig, Peggy
AU - Zhang, Chunming
AU - Burnside, Elizabeth
AU - McCarty, Catherine
AU - Page, David
PY - 2012
Y1 - 2012
N2 - Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We propose a multiple testing procedure which is based on a Markov-random-field-coupled mixture model. The ground truth of hypotheses is represented by a latent binary Markov random field, and the observed test statistics appear as the coupled mixture variables. The parameters in our model can be automatically learned by a novel EM algorithm. We use an MCMC algorithm to infer the posterior probability that each hypothesis is null (termed local index of significance), and the false discovery rate can be controlled accordingly. Simulations show that the numerical performance of multiple testing can be improved substantially by using our procedure. We apply the procedure to a real-world genome-wide association study on breast cancer, and we identify several SNPs with strong association evidence.
AB - Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We propose a multiple testing procedure which is based on a Markov-random-field-coupled mixture model. The ground truth of hypotheses is represented by a latent binary Markov random field, and the observed test statistics appear as the coupled mixture variables. The parameters in our model can be automatically learned by a novel EM algorithm. We use an MCMC algorithm to infer the posterior probability that each hypothesis is null (termed local index of significance), and the false discovery rate can be controlled accordingly. Simulations show that the numerical performance of multiple testing can be improved substantially by using our procedure. We apply the procedure to a real-world genome-wide association study on breast cancer, and we identify several SNPs with strong association evidence.
UR - http://www.scopus.com/inward/record.url?scp=84886000672&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84886000672&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84886000672
SN - 9780974903989
T3 - Uncertainty in Artificial Intelligence - Proceedings of the 28th Conference, UAI 2012
SP - 511
EP - 522
BT - Uncertainty in Artificial Intelligence - Proceedings of the 28th Conference, UAI 2012
T2 - 28th Conference on Uncertainty in Artificial Intelligence, UAI 2012
Y2 - 15 August 2012 through 17 August 2012
ER -