Sparse principal component analysis

Hui Zou, Trevor Hastie, Robert Tibshirani

Research output: Contribution to journalArticlepeer-review

1658 Scopus citations

Abstract

Principal component analysis (PCA) is widely used in data processing and dimensionality reduction. However, PCA suffers from the fact that each principal component is a linear combination of all the original variables, thus it is often difficult to interpret the results. We introduce a new method called sparse principal component analysis (SPCA) using the lasso (elastic net) to produce modified principal components with sparse loadings. We first show that PCA can be formulated as a regression-type optimization problem; sparse loadings are then obtained by imposing the lasso (elastic net) constraint on the regression coefficients. Efficient algorithms are proposed to fit our SPCA models for both regular multivariate data and gene expression arrays. We also give a new formula to compute the total variance of modified principal components. As illustrations, SPCA is applied to real and simulated data with encouraging results.

Original languageEnglish (US)
Pages (from-to)265-286
Number of pages22
JournalJournal of Computational and Graphical Statistics
Volume15
Issue number2
DOIs
StatePublished - Jun 2006

Keywords

  • Arrays
  • Gene expression
  • Lasso/elastic net
  • Multivariate analysis
  • Singular value decomposition
  • Thresholding

Fingerprint Dive into the research topics of 'Sparse principal component analysis'. Together they form a unique fingerprint.

Cite this