A data-driven approach to conditional screening of high-dimensional variables

Hyokyoung G. Hong, Lan Wang, Xuming He

Research output: Contribution to journalArticle

6 Scopus citations

Abstract

Marginal screening is a widely applied technique to handily reduce the dimensionality of the data when the number of potential features overwhelms the sample size. Because of the nature of the marginal screening procedures, they are also known for their difficulty in identifying the so-called hidden variables that are jointly important but have weak marginal associations with the response variable. Failing to include a hidden variable in the screening stage has two undesirable consequences: (1) important features are missed out in model selection, and (2) biased inference is likely to occur in the subsequent analysis. Motivated by some recent work in conditional screening, we propose a data-driven conditional screening algorithm, which is computationally efficient, enjoys the sure screening property under weaker assumptions on the model and works robustly in a variety of settings to reduce false negatives of hidden variables. Numerical comparison with alternatives screening procedures is also made to shed light on the relative merit of the proposed method. We illustrate the proposed methodology using a leukaemia microarray data example.

Original languageEnglish (US)
Pages (from-to)200-212
Number of pages13
JournalStat
Volume5
Issue number1
DOIs
StatePublished - Jan 1 2016

    Fingerprint

Keywords

  • conditional screening
  • false negative
  • feature screening
  • high dimension
  • sparse principal component analysis
  • sure screening property

Cite this