High-dimensional variable selection with right-censored length-biased data

Di He, Yong Zhou, Hui Zou

Research output: Contribution to journalArticle

Abstract

Length-biased data are common in various fields, including epidemiology and labor economics, and they have attracted considerable attention in survival literature. A crucial goal of a survival analysis is to identify a subset of risk factors and their risk contributions from among a vast number of clinical covariates. However, there is no research on variable selection for length-biased data, owing to the complex nature of such data and the lack of a convenient loss function. Therefore, we propose an estimation method based on penalized estimating equations to obtain a sparse and consistent estimator for length-biased data under an accelerated failure time model. The proposed estimator possesses the selection and estimation consistency property. In particular, we implement our method using a SCAD penalty and a local linear approximation algorithm. We suggest selecting the tuning parameter using the extended BIC in high-dimensional settings. Furthermore, we develop a novel multistage SCAD penalized estimating equation procedure to achieve improved estimation accuracy and sparsity in the variable selection. Simulation studies show that the proposed procedure has high accuracy and almost perfect sparsity. Oscar Awards data are analyzed as an application of the proposed method.

Original languageEnglish (US)
Pages (from-to)193-215
Number of pages23
JournalStatistica Sinica
Volume30
Issue number1
DOIs
StatePublished - Jan 2020

Keywords

  • Accelerated failure time model
  • High-dimensional variable selection
  • Length-biased data
  • Multi-stage penalization

Fingerprint Dive into the research topics of 'High-dimensional variable selection with right-censored length-biased data'. Together they form a unique fingerprint.

  • Cite this