Length-biased data are common in various fields, including epidemiology and labor economics, and they have attracted considerable attention in survival literature. A crucial goal of a survival analysis is to identify a subset of risk factors and their risk contributions from among a vast number of clinical covariates. However, there is no research on variable selection for length-biased data, owing to the complex nature of such data and the lack of a convenient loss function. Therefore, we propose an estimation method based on penalized estimating equations to obtain a sparse and consistent estimator for length-biased data under an accelerated failure time model. The proposed estimator possesses the selection and estimation consistency property. In particular, we implement our method using a SCAD penalty and a local linear approximation algorithm. We suggest selecting the tuning parameter using the extended BIC in high-dimensional settings. Furthermore, we develop a novel multistage SCAD penalized estimating equation procedure to achieve improved estimation accuracy and sparsity in the variable selection. Simulation studies show that the proposed procedure has high accuracy and almost perfect sparsity. Oscar Awards data are analyzed as an application of the proposed method.
Bibliographical noteFunding Information:
We thank the editor, associate editor, and referees for their helpful comments and suggestions. Zou’s work was supported, in part, by NSF grant DMS-1505111. Zhou’s work was supported by the State Key Program in the Major Research Plan of National Natural Science Foundation of China (91546202) and the Key Laboratory of Advanced Theory and Application in Statistics and Data Science, MOE.
© 2020 Institute of Statistical Science. All rights reserved.
- Accelerated failure time model
- High-dimensional variable selection
- Length-biased data
- Multi-stage penalization