TY - JOUR
T1 - Gaussian copula precision estimation with missing values
AU - Wang, Huahua
AU - Fazayeli, Faridel
AU - Chatterjee, Soumyadeep
AU - Banerjee, Arindam
N1 - Copyright:
Copyright 2016 Elsevier B.V., All rights reserved.
PY - 2014
Y1 - 2014
N2 - We consider the problem of estimating sparse precision matrix of Gaussian copula distributions using samples with missing values in high dimensions. Existing approaches, primarily designed for Gaussian distributions, suggest using plugin estimators by disregarding the missing values. In this paper, we propose double plugin Gaussian (DoPinG) copula estimators to estimate the sparse precision matrix corresponding to non-paranormal distributions. DoPinG uses two plugin procedures and consists of three steps: (1) estimate nonparametric correlations based on observed values, including Kendall's tau and Spearman's rho; (2) estimate the non-paranormal correlation matrix; (3) plug into existing sparse precision estimators. We prove that DoPinG copula estimators consistently estimate the non-paranormal correlation matrix at a rate of O(1/1-δ √log p/n), where δ is the probability of missing values. We provide experimental results to illustrate the effect of sample size and percentage of missing data on the model performance. Experimental results show that DoPinG is significantly better than estimators like mGlasso, which are primarily designed for Gaussian data.
AB - We consider the problem of estimating sparse precision matrix of Gaussian copula distributions using samples with missing values in high dimensions. Existing approaches, primarily designed for Gaussian distributions, suggest using plugin estimators by disregarding the missing values. In this paper, we propose double plugin Gaussian (DoPinG) copula estimators to estimate the sparse precision matrix corresponding to non-paranormal distributions. DoPinG uses two plugin procedures and consists of three steps: (1) estimate nonparametric correlations based on observed values, including Kendall's tau and Spearman's rho; (2) estimate the non-paranormal correlation matrix; (3) plug into existing sparse precision estimators. We prove that DoPinG copula estimators consistently estimate the non-paranormal correlation matrix at a rate of O(1/1-δ √log p/n), where δ is the probability of missing values. We provide experimental results to illustrate the effect of sample size and percentage of missing data on the model performance. Experimental results show that DoPinG is significantly better than estimators like mGlasso, which are primarily designed for Gaussian data.
UR - http://www.scopus.com/inward/record.url?scp=84955512288&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84955512288&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84955512288
SN - 1532-4435
VL - 33
SP - 978
EP - 986
JO - Journal of Machine Learning Research
JF - Journal of Machine Learning Research
T2 - 17th International Conference on Artificial Intelligence and Statistics, AISTATS 2014
Y2 - 22 April 2014 through 25 April 2014
ER -