A faster algorithm for ridge regression of reduced rank data

Douglas M. Hawkins, Xiangrong Yin

Research output: Contribution to journalArticlepeer-review

33 Scopus citations

Abstract

Regression data sets typically have many more cases than variables, but this is not always the case. Some current problems in chemometrics-for example fitting quantitative structure activity relationships-may involve fitting linear models to data sets in which the number of predictors far exceeds the number of cases. Ridge regression is an approach that has some theoretical foundation and has performed well in comparison with alternatives such as PLS and subset regression. Direct implementation of the regression formulation leads to a O(np2 + p3) calculation, which is substantial if p is large. We show that ridge regression may be performed in a O(np2) computation-a potentially large saving when p is larger than n. The algorithm lends itself to the use of case weights, to robust bounded influence fitting, and cross-validation. The method is illustrated with a chemometric data set with 255 predictors, but only 18 cases, a ratio not unusual in QSAR problems.

Original languageEnglish (US)
Pages (from-to)253-262
Number of pages10
JournalComputational Statistics and Data Analysis
Volume40
Issue number2
DOIs
StatePublished - Aug 28 2002

Bibliographical note

Funding Information:
The authors are grateful to the referees for a number of suggestion for improvement in the paper. The work of Hawkins was supported in part by the National Science Foundation under grants DMS 9803622 and ACI 9619020, and the work of Yin was supported in part by the University of Georgia Research Foundation. The authors gratefully acknowledge the assistance of Jerome Friedman in providing his PLS code, and Subhash Basak in providing the chemodescriptors in the example data set.

Keywords

  • Case diagnostics
  • Chemometrics
  • Cross validation
  • QSAR
  • Weighted regression

Fingerprint

Dive into the research topics of 'A faster algorithm for ridge regression of reduced rank data'. Together they form a unique fingerprint.

Cite this