Length-biased data are common in various fields, including epidemiology and labor economics, and they have attracted considerable attention in survival literature. A crucial goal of a survival analysis is to identify a subset of risk factors and their risk contributions from among a vast number of clinical covariates. However, there is no research on variable selection for length-biased data, owing to the complex nature of such data and the lack of a convenient loss function. Therefore, we propose an estimation method based on penalized estimating equations to obtain a sparse and consistent estimator for length-biased data under an accelerated failure time model. The proposed estimator possesses the selection and estimation consistency property. In particular, we implement our method using a SCAD penalty and a local linear approximation algorithm. We suggest selecting the tuning parameter using the extended BIC in high-dimensional settings. Furthermore, we develop a novel multistage SCAD penalized estimating equation procedure to achieve improved estimation accuracy and sparsity in the variable selection. Simulation studies show that the proposed procedure has high accuracy and almost perfect sparsity. Oscar Awards data are analyzed as an application of the proposed method.
- Accelerated failure time model
- High-dimensional variable selection
- Length-biased data
- Multi-stage penalization