We propose a semiparametric latent Gaussian copula model for modelling mixed multivariate data, which contain a combination of both continuous and binary variables. The model assumes that the observed binary variables are obtained by dichotomizing latent variables that satisfy the Gaussian copula distribution. The goal is to infer the conditional independence relationship between the latent random variables, based on the observed mixed data. Our work has two main contributions: we propose a unified rank-based approach to estimate the correlation matrix of latent variables; we establish the concentration inequality of the proposed rank-based estimator. Consequently, our methods achieve the same rates of convergence for precision matrix estimation and graph recovery, as if the latent variables were observed. The methods proposed are numerically assessed through extensive simulation studies, and real data analysis.
|Original language||English (US)|
|Number of pages||17|
|Journal||Journal of the Royal Statistical Society. Series B: Statistical Methodology|
|State||Published - Mar 1 2017|
Bibliographical noteFunding Information:
We thank the Joint Editor, the Associate Editor and the three referees for their helpful comments, which significantly improved the paper. Fan's research was supported by National Institutes of Health grant 2R01-GM072611-9 and National Science Foundation grants DMS-1206464 and DMS-1406266. Liu's research was supported by National Science Foundation grants III-1116730 and III-1332109, National Institutes of Health grants R01MH102339, R01GM083084 and R01HG06841, and Food and Drug Administration grant HHSF223201000072C. Zou's research was supported in part by National Science Foundation grant DMS-0846068.
© 2016 Royal Statistical Society
- Discrete data
- Gaussian copula
- Latent variable
- Mixed data
- Rank-based statistic