Eeboost: A general method for prediction and variable selection based on estimating equations

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

The modern statistical literature is replete with methods for performing variable selection and prediction in standard regression problems. However, simple models may misspecify or fail to capture important aspects of the data generating process such as missingness, correlation, and over/underdispersion. In this article we describe EEBoost, a strategy for variable selection and prediction which can be applied in high-dimensional settings where inference for low-dimensional parameters would typically be based on estimating equations. The method is simple, flexible, and easily implemented using existing software. The EEBoost algorithm is obtained as a modification of the standard boosting (or functional gradient descent) technique.We show that EEBoost is closely related to a class of L1 constrained projected likelihood ratio minimizations, and therefore produces similar variable selection paths to penalized methods without the need to apply constrained optimization algorithms. The flexibility of EEBoost is illustrated by applying it to simulated examples with correlated outcomes and timeto-event data with missing covariates. In both cases, EEBoost outperforms variable selection methods which do not account for the relevant data characteristics. Furthermore, it is shown to be substantially faster to compute than competing methods based on penalized estimating equations.We also apply a version of EEBoost based on the Buckley-James estimating equations to data from an HIV treatment trial, where the aim is to identify mutations which confer resistance to antiretroviral medications. Proofs of the main results appear in the Supplemental Materials (available online).

Original languageEnglish (US)
Pages (from-to)296-305
Number of pages10
JournalJournal of the American Statistical Association
Volume106
Issue number493
DOIs
StatePublished - Mar 1 2011

Fingerprint

Estimating Equation
Variable Selection
Prediction
Underdispersion
Missing Covariates
Gradient Descent
Likelihood Ratio
Boosting
Constrained Optimization
Optimization Algorithm
Mutation
High-dimensional
Regression
Flexibility
Variable selection
Path
Software
Standards

Keywords

  • Boosting
  • Model selection
  • Prediction
  • Projected likelihood

Cite this

Eeboost : A general method for prediction and variable selection based on estimating equations. / Wolfson, Julian.

In: Journal of the American Statistical Association, Vol. 106, No. 493, 01.03.2011, p. 296-305.

Research output: Contribution to journalArticle

@article{543d9d78b1a743f8a4861cb636c7046c,
title = "Eeboost: A general method for prediction and variable selection based on estimating equations",
abstract = "The modern statistical literature is replete with methods for performing variable selection and prediction in standard regression problems. However, simple models may misspecify or fail to capture important aspects of the data generating process such as missingness, correlation, and over/underdispersion. In this article we describe EEBoost, a strategy for variable selection and prediction which can be applied in high-dimensional settings where inference for low-dimensional parameters would typically be based on estimating equations. The method is simple, flexible, and easily implemented using existing software. The EEBoost algorithm is obtained as a modification of the standard boosting (or functional gradient descent) technique.We show that EEBoost is closely related to a class of L1 constrained projected likelihood ratio minimizations, and therefore produces similar variable selection paths to penalized methods without the need to apply constrained optimization algorithms. The flexibility of EEBoost is illustrated by applying it to simulated examples with correlated outcomes and timeto-event data with missing covariates. In both cases, EEBoost outperforms variable selection methods which do not account for the relevant data characteristics. Furthermore, it is shown to be substantially faster to compute than competing methods based on penalized estimating equations.We also apply a version of EEBoost based on the Buckley-James estimating equations to data from an HIV treatment trial, where the aim is to identify mutations which confer resistance to antiretroviral medications. Proofs of the main results appear in the Supplemental Materials (available online).",
keywords = "Boosting, Model selection, Prediction, Projected likelihood",
author = "Julian Wolfson",
year = "2011",
month = "3",
day = "1",
doi = "10.1198/jasa.2011.tm10098",
language = "English (US)",
volume = "106",
pages = "296--305",
journal = "Journal of the American Statistical Association",
issn = "0162-1459",
publisher = "Taylor and Francis Ltd.",
number = "493",

}

TY - JOUR

T1 - Eeboost

T2 - A general method for prediction and variable selection based on estimating equations

AU - Wolfson, Julian

PY - 2011/3/1

Y1 - 2011/3/1

N2 - The modern statistical literature is replete with methods for performing variable selection and prediction in standard regression problems. However, simple models may misspecify or fail to capture important aspects of the data generating process such as missingness, correlation, and over/underdispersion. In this article we describe EEBoost, a strategy for variable selection and prediction which can be applied in high-dimensional settings where inference for low-dimensional parameters would typically be based on estimating equations. The method is simple, flexible, and easily implemented using existing software. The EEBoost algorithm is obtained as a modification of the standard boosting (or functional gradient descent) technique.We show that EEBoost is closely related to a class of L1 constrained projected likelihood ratio minimizations, and therefore produces similar variable selection paths to penalized methods without the need to apply constrained optimization algorithms. The flexibility of EEBoost is illustrated by applying it to simulated examples with correlated outcomes and timeto-event data with missing covariates. In both cases, EEBoost outperforms variable selection methods which do not account for the relevant data characteristics. Furthermore, it is shown to be substantially faster to compute than competing methods based on penalized estimating equations.We also apply a version of EEBoost based on the Buckley-James estimating equations to data from an HIV treatment trial, where the aim is to identify mutations which confer resistance to antiretroviral medications. Proofs of the main results appear in the Supplemental Materials (available online).

AB - The modern statistical literature is replete with methods for performing variable selection and prediction in standard regression problems. However, simple models may misspecify or fail to capture important aspects of the data generating process such as missingness, correlation, and over/underdispersion. In this article we describe EEBoost, a strategy for variable selection and prediction which can be applied in high-dimensional settings where inference for low-dimensional parameters would typically be based on estimating equations. The method is simple, flexible, and easily implemented using existing software. The EEBoost algorithm is obtained as a modification of the standard boosting (or functional gradient descent) technique.We show that EEBoost is closely related to a class of L1 constrained projected likelihood ratio minimizations, and therefore produces similar variable selection paths to penalized methods without the need to apply constrained optimization algorithms. The flexibility of EEBoost is illustrated by applying it to simulated examples with correlated outcomes and timeto-event data with missing covariates. In both cases, EEBoost outperforms variable selection methods which do not account for the relevant data characteristics. Furthermore, it is shown to be substantially faster to compute than competing methods based on penalized estimating equations.We also apply a version of EEBoost based on the Buckley-James estimating equations to data from an HIV treatment trial, where the aim is to identify mutations which confer resistance to antiretroviral medications. Proofs of the main results appear in the Supplemental Materials (available online).

KW - Boosting

KW - Model selection

KW - Prediction

KW - Projected likelihood

UR - http://www.scopus.com/inward/record.url?scp=79954455620&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79954455620&partnerID=8YFLogxK

U2 - 10.1198/jasa.2011.tm10098

DO - 10.1198/jasa.2011.tm10098

M3 - Article

AN - SCOPUS:79954455620

VL - 106

SP - 296

EP - 305

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

SN - 0162-1459

IS - 493

ER -