Toward an objective and reproducible model choice via variable selection deviation

Wenjing Yang, Yuhong Yang

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Various model selection methods can be applied to seek sparse subsets of the covariates to explain the response of interest in bioinformatics. While such methods often offer very helpful predictive performances, their selections of the covariates may be much less trustworthy. Indeed, when the number of covariates is large, the selections can be highly unstable, even under a slight change of the data. This casts a serious doubt on reproducibility of the identified variables. For a sound scientific understanding of the regression relationship, methods need to be developed to find the most important covariates that have higher chance to be confirmed in future studies. Such a method based on variable selection deviation is proposed and evaluated in this work.

Original languageEnglish (US)
Pages (from-to)20-30
Number of pages11
JournalBiometrics
Volume73
Issue number1
DOIs
StatePublished - Mar 1 2017

Fingerprint

Model Choice
Bioinformatics
Variable Selection
Covariates
Deviation
Acoustic waves
selection methods
bioinformatics
reproducibility
Reproducibility
Computational Biology
methodology
Model Selection
Regression
Unstable
Subset

Keywords

  • Feature selection
  • Gene expression
  • Reproducibility
  • Variable selection deviation

Cite this

Toward an objective and reproducible model choice via variable selection deviation. / Yang, Wenjing; Yang, Yuhong.

In: Biometrics, Vol. 73, No. 1, 01.03.2017, p. 20-30.

Research output: Contribution to journalArticle

@article{03d46a47438343799d251063475e1e3f,
title = "Toward an objective and reproducible model choice via variable selection deviation",
abstract = "Various model selection methods can be applied to seek sparse subsets of the covariates to explain the response of interest in bioinformatics. While such methods often offer very helpful predictive performances, their selections of the covariates may be much less trustworthy. Indeed, when the number of covariates is large, the selections can be highly unstable, even under a slight change of the data. This casts a serious doubt on reproducibility of the identified variables. For a sound scientific understanding of the regression relationship, methods need to be developed to find the most important covariates that have higher chance to be confirmed in future studies. Such a method based on variable selection deviation is proposed and evaluated in this work.",
keywords = "Feature selection, Gene expression, Reproducibility, Variable selection deviation",
author = "Wenjing Yang and Yuhong Yang",
year = "2017",
month = "3",
day = "1",
doi = "10.1111/biom.12554",
language = "English (US)",
volume = "73",
pages = "20--30",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "1",

}

TY - JOUR

T1 - Toward an objective and reproducible model choice via variable selection deviation

AU - Yang, Wenjing

AU - Yang, Yuhong

PY - 2017/3/1

Y1 - 2017/3/1

N2 - Various model selection methods can be applied to seek sparse subsets of the covariates to explain the response of interest in bioinformatics. While such methods often offer very helpful predictive performances, their selections of the covariates may be much less trustworthy. Indeed, when the number of covariates is large, the selections can be highly unstable, even under a slight change of the data. This casts a serious doubt on reproducibility of the identified variables. For a sound scientific understanding of the regression relationship, methods need to be developed to find the most important covariates that have higher chance to be confirmed in future studies. Such a method based on variable selection deviation is proposed and evaluated in this work.

AB - Various model selection methods can be applied to seek sparse subsets of the covariates to explain the response of interest in bioinformatics. While such methods often offer very helpful predictive performances, their selections of the covariates may be much less trustworthy. Indeed, when the number of covariates is large, the selections can be highly unstable, even under a slight change of the data. This casts a serious doubt on reproducibility of the identified variables. For a sound scientific understanding of the regression relationship, methods need to be developed to find the most important covariates that have higher chance to be confirmed in future studies. Such a method based on variable selection deviation is proposed and evaluated in this work.

KW - Feature selection

KW - Gene expression

KW - Reproducibility

KW - Variable selection deviation

UR - http://www.scopus.com/inward/record.url?scp=84994323010&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994323010&partnerID=8YFLogxK

U2 - 10.1111/biom.12554

DO - 10.1111/biom.12554

M3 - Article

C2 - 27355481

AN - SCOPUS:84994323010

VL - 73

SP - 20

EP - 30

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 1

ER -