Linear regression and two-class classification with gene expression data

Xiaohong Huang, Wei Pan

Research output: Contribution to journalArticlepeer-review

89 Scopus citations


Motivation: Using gene expression data to classify (or predict) tumor types has received much research attention recently. Due to some special features of gene expression data, several new methods have been proposed, including the weighted voting scheme of Golub et al., the compound covariate method of Hedenfalk et al. (originally proposed by Tukey), and the shrunken centroids method of Tibshirani et al. These methods look different and are more or less ad hoc. Results: We point out a close connection of the three methods with a linear regression model. Casting the classification problem in the general framework of linear regression naturally leads to new alternatives, such as partial least squares (PLS) methods and penalized PLS (PPLS) methods. Using two real data sets, we show the competitive performance of our new methods when compared with the other three methods.

Original languageEnglish (US)
Pages (from-to)2072-2078
Number of pages7
Issue number16
StatePublished - Nov 1 2003

Bibliographical note

Funding Information:
The authors thank three reviewers for many constructive and helpful comments. This research was partially supported by NIH grant R01-HL65462 and a Minnesota Medical Foundation grant.


Dive into the research topics of 'Linear regression and two-class classification with gene expression data'. Together they form a unique fingerprint.

Cite this