ThrEEBoost: Thresholded Boosting for Variable Selection and Prediction via Estimating Equations

Ben Brown, Christopher J. Miller, Julian Wolfson

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Most variable selection techniques for high-dimensional models are designed to be used in settings, where observations are independent and completely observed. At the same time, there is a rich literature on approaches to estimation of low-dimensional parameters in the presence of correlation, missingness, measurement error, selection bias, and other characteristics of real data. In this article, we present ThrEEBoost (Thresholded EEBoost), a general-purpose variable selection technique which can accommodate such problem characteristics by replacing the gradient of the loss by an estimating function. ThrEEBoost generalizes the previously proposed EEBoost algorithm (Wolfson 2011) by allowing the number of regression coefficients updated at each step to be controlled by a thresholding parameter. Different thresholding parameter values yield different variable selection paths, greatly diversifying the set of models that can be explored; the optimal degree of thresholding can be chosen by cross-validation. ThrEEBoost was evaluated using simulation studies to assess the effects of different threshold values on prediction error, sensitivity, specificity, and the number of iterations to identify minimum prediction error under both sparse and nonsparse true models with correlated continuous outcomes. We show that when the true model is sparse, ThrEEBoost achieves similar prediction error to EEBoost while requiring fewer iterations to locate the set of coefficients yielding the minimum error. When the true model is less sparse, ThrEEBoost has lower prediction error than EEBoost and also finds the point yielding the minimum error more quickly. The technique is illustrated by applying it to the problem of identifying predictors of weight change in a longitudinal nutrition study. Supplementary materials are available online.

Original languageEnglish (US)
Pages (from-to)579-588
Number of pages10
JournalJournal of Computational and Graphical Statistics
Volume26
Issue number3
DOIs
StatePublished - Jul 3 2017

Keywords

  • Correlation
  • GEE
  • Thresholding

Fingerprint Dive into the research topics of 'ThrEEBoost: Thresholded Boosting for Variable Selection and Prediction via Estimating Equations'. Together they form a unique fingerprint.

Cite this