Sparse linear discriminant analysis in structured covariates space

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Classification with high-dimensional variables is a popular goal in many modern statistical studies. Fisher's linear discriminant analysis (LDA) is a common and effective tool for classifying entities into existing groups. It is well known that classification using Fisher's discriminant for high-dimensional data is as bad as random guessing because of the use of many noise features, which increases the misclassification rate. Recently, it is being acknowledged that complex biological mechanisms occur through multiple features working together, though individually these features may contribute to noise accumulation in the data. In view of these, it is important to perform classification with discriminant vectors that use a subset of important variables, while also utilizing prior biological relationships among features. We tackle this problem in this paper and propose methods that incorporate variable selection into the classification problem for the identification of important biomarkers. Furthermore, we incorporate into the LDA problem prior information on the relationships among variables using undirected graphs in order to identify functionally meaningful biomarkers. We compare our methods with existing sparse LDA approaches via simulation studies and real data analysis.

Original languageEnglish (US)
Pages (from-to)56-69
Number of pages14
JournalStatistical Analysis and Data Mining
Volume12
Issue number2
DOIs
StatePublished - Apr 2019

Fingerprint

Discriminant analysis
Discriminant Analysis
Covariates
Biomarkers
Discriminant
Misclassification Rate
Prior Information
High-dimensional Data
Variable Selection
Undirected Graph
Classification Problems
Data analysis
High-dimensional
Simulation Study
Subset
Relationships

Keywords

  • biological information, pathway analysis
  • classification
  • high-dimensional data
  • linear discriminant analysis
  • sparsity

Cite this

Sparse linear discriminant analysis in structured covariates space. / Safo, Sandra E.; Long, Qi.

In: Statistical Analysis and Data Mining, Vol. 12, No. 2, 04.2019, p. 56-69.

Research output: Contribution to journalArticle

@article{0e122e5281304f0c95a9f9ca07c049ec,
title = "Sparse linear discriminant analysis in structured covariates space",
abstract = "Classification with high-dimensional variables is a popular goal in many modern statistical studies. Fisher's linear discriminant analysis (LDA) is a common and effective tool for classifying entities into existing groups. It is well known that classification using Fisher's discriminant for high-dimensional data is as bad as random guessing because of the use of many noise features, which increases the misclassification rate. Recently, it is being acknowledged that complex biological mechanisms occur through multiple features working together, though individually these features may contribute to noise accumulation in the data. In view of these, it is important to perform classification with discriminant vectors that use a subset of important variables, while also utilizing prior biological relationships among features. We tackle this problem in this paper and propose methods that incorporate variable selection into the classification problem for the identification of important biomarkers. Furthermore, we incorporate into the LDA problem prior information on the relationships among variables using undirected graphs in order to identify functionally meaningful biomarkers. We compare our methods with existing sparse LDA approaches via simulation studies and real data analysis.",
keywords = "biological information, pathway analysis, classification, high-dimensional data, linear discriminant analysis, sparsity",
author = "Safo, {Sandra E.} and Qi Long",
year = "2019",
month = "4",
doi = "10.1002/sam.11376",
language = "English (US)",
volume = "12",
pages = "56--69",
journal = "Statistical Analysis and Data Mining",
issn = "1932-1872",
publisher = "John Wiley and Sons Inc.",
number = "2",

}

TY - JOUR

T1 - Sparse linear discriminant analysis in structured covariates space

AU - Safo, Sandra E.

AU - Long, Qi

PY - 2019/4

Y1 - 2019/4

N2 - Classification with high-dimensional variables is a popular goal in many modern statistical studies. Fisher's linear discriminant analysis (LDA) is a common and effective tool for classifying entities into existing groups. It is well known that classification using Fisher's discriminant for high-dimensional data is as bad as random guessing because of the use of many noise features, which increases the misclassification rate. Recently, it is being acknowledged that complex biological mechanisms occur through multiple features working together, though individually these features may contribute to noise accumulation in the data. In view of these, it is important to perform classification with discriminant vectors that use a subset of important variables, while also utilizing prior biological relationships among features. We tackle this problem in this paper and propose methods that incorporate variable selection into the classification problem for the identification of important biomarkers. Furthermore, we incorporate into the LDA problem prior information on the relationships among variables using undirected graphs in order to identify functionally meaningful biomarkers. We compare our methods with existing sparse LDA approaches via simulation studies and real data analysis.

AB - Classification with high-dimensional variables is a popular goal in many modern statistical studies. Fisher's linear discriminant analysis (LDA) is a common and effective tool for classifying entities into existing groups. It is well known that classification using Fisher's discriminant for high-dimensional data is as bad as random guessing because of the use of many noise features, which increases the misclassification rate. Recently, it is being acknowledged that complex biological mechanisms occur through multiple features working together, though individually these features may contribute to noise accumulation in the data. In view of these, it is important to perform classification with discriminant vectors that use a subset of important variables, while also utilizing prior biological relationships among features. We tackle this problem in this paper and propose methods that incorporate variable selection into the classification problem for the identification of important biomarkers. Furthermore, we incorporate into the LDA problem prior information on the relationships among variables using undirected graphs in order to identify functionally meaningful biomarkers. We compare our methods with existing sparse LDA approaches via simulation studies and real data analysis.

KW - biological information, pathway analysis

KW - classification

KW - high-dimensional data

KW - linear discriminant analysis

KW - sparsity

UR - http://www.scopus.com/inward/record.url?scp=85045731533&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045731533&partnerID=8YFLogxK

U2 - 10.1002/sam.11376

DO - 10.1002/sam.11376

M3 - Article

AN - SCOPUS:85045731533

VL - 12

SP - 56

EP - 69

JO - Statistical Analysis and Data Mining

JF - Statistical Analysis and Data Mining

SN - 1932-1872

IS - 2

ER -