High dimensional semiparametric latent graphical model for mixed data

Jianqing Fan, Han Liu, Yang Ning, Hui Zou

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

We propose a semiparametric latent Gaussian copula model for modelling mixed multivariate data, which contain a combination of both continuous and binary variables. The model assumes that the observed binary variables are obtained by dichotomizing latent variables that satisfy the Gaussian copula distribution. The goal is to infer the conditional independence relationship between the latent random variables, based on the observed mixed data. Our work has two main contributions: we propose a unified rank-based approach to estimate the correlation matrix of latent variables; we establish the concentration inequality of the proposed rank-based estimator. Consequently, our methods achieve the same rates of convergence for precision matrix estimation and graph recovery, as if the latent variables were observed. The methods proposed are numerically assessed through extensive simulation studies, and real data analysis.

Original languageEnglish (US)
Pages (from-to)405-421
Number of pages17
JournalJournal of the Royal Statistical Society. Series B: Statistical Methodology
Volume79
Issue number2
DOIs
StatePublished - Mar 1 2017

Fingerprint

Mixed Data
Latent Variables
Graphical Models
High-dimensional
Binary Variables
Concentration Inequalities
Copula Models
Conditional Independence
Correlation Matrix
Multivariate Data
Continuous Variables
Copula
Gaussian Model
Data analysis
Rate of Convergence
Recovery
Random variable
Simulation Study
Estimator
Graphical models

Keywords

  • Discrete data
  • Gaussian copula
  • Latent variable
  • Mixed data
  • Non-paranormal
  • Rank-based statistic

Cite this

High dimensional semiparametric latent graphical model for mixed data. / Fan, Jianqing; Liu, Han; Ning, Yang; Zou, Hui.

In: Journal of the Royal Statistical Society. Series B: Statistical Methodology, Vol. 79, No. 2, 01.03.2017, p. 405-421.

Research output: Contribution to journalArticle

@article{3a0ced64a31e41d889043a68fb8b4f0f,
title = "High dimensional semiparametric latent graphical model for mixed data",
abstract = "We propose a semiparametric latent Gaussian copula model for modelling mixed multivariate data, which contain a combination of both continuous and binary variables. The model assumes that the observed binary variables are obtained by dichotomizing latent variables that satisfy the Gaussian copula distribution. The goal is to infer the conditional independence relationship between the latent random variables, based on the observed mixed data. Our work has two main contributions: we propose a unified rank-based approach to estimate the correlation matrix of latent variables; we establish the concentration inequality of the proposed rank-based estimator. Consequently, our methods achieve the same rates of convergence for precision matrix estimation and graph recovery, as if the latent variables were observed. The methods proposed are numerically assessed through extensive simulation studies, and real data analysis.",
keywords = "Discrete data, Gaussian copula, Latent variable, Mixed data, Non-paranormal, Rank-based statistic",
author = "Jianqing Fan and Han Liu and Yang Ning and Hui Zou",
year = "2017",
month = "3",
day = "1",
doi = "10.1111/rssb.12168",
language = "English (US)",
volume = "79",
pages = "405--421",
journal = "Journal of the Royal Statistical Society. Series B: Statistical Methodology",
issn = "1369-7412",
publisher = "Wiley-Blackwell",
number = "2",

}

TY - JOUR

T1 - High dimensional semiparametric latent graphical model for mixed data

AU - Fan, Jianqing

AU - Liu, Han

AU - Ning, Yang

AU - Zou, Hui

PY - 2017/3/1

Y1 - 2017/3/1

N2 - We propose a semiparametric latent Gaussian copula model for modelling mixed multivariate data, which contain a combination of both continuous and binary variables. The model assumes that the observed binary variables are obtained by dichotomizing latent variables that satisfy the Gaussian copula distribution. The goal is to infer the conditional independence relationship between the latent random variables, based on the observed mixed data. Our work has two main contributions: we propose a unified rank-based approach to estimate the correlation matrix of latent variables; we establish the concentration inequality of the proposed rank-based estimator. Consequently, our methods achieve the same rates of convergence for precision matrix estimation and graph recovery, as if the latent variables were observed. The methods proposed are numerically assessed through extensive simulation studies, and real data analysis.

AB - We propose a semiparametric latent Gaussian copula model for modelling mixed multivariate data, which contain a combination of both continuous and binary variables. The model assumes that the observed binary variables are obtained by dichotomizing latent variables that satisfy the Gaussian copula distribution. The goal is to infer the conditional independence relationship between the latent random variables, based on the observed mixed data. Our work has two main contributions: we propose a unified rank-based approach to estimate the correlation matrix of latent variables; we establish the concentration inequality of the proposed rank-based estimator. Consequently, our methods achieve the same rates of convergence for precision matrix estimation and graph recovery, as if the latent variables were observed. The methods proposed are numerically assessed through extensive simulation studies, and real data analysis.

KW - Discrete data

KW - Gaussian copula

KW - Latent variable

KW - Mixed data

KW - Non-paranormal

KW - Rank-based statistic

UR - http://www.scopus.com/inward/record.url?scp=84962882081&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962882081&partnerID=8YFLogxK

U2 - 10.1111/rssb.12168

DO - 10.1111/rssb.12168

M3 - Article

AN - SCOPUS:84962882081

VL - 79

SP - 405

EP - 421

JO - Journal of the Royal Statistical Society. Series B: Statistical Methodology

JF - Journal of the Royal Statistical Society. Series B: Statistical Methodology

SN - 1369-7412

IS - 2

ER -