TY - JOUR

T1 - A tale of two matrix factorizations

AU - Fogel, Paul

AU - Hawkins, Douglas M.

AU - Beecher, Chris

AU - Luta, George

AU - Young, S. Stanley

PY - 2013/1/1

Y1 - 2013/1/1

N2 - In statistical practice, rectangular tables of numeric data are commonplace, and are often analyzed using dimensionreduction methods like the singular value decomposition and its close cousin, principal component analysis (PCA). This analysis produces score and loading matrices representing the rows and the columns of the original table and thesematrices may be used for both prediction purposes and to gain structural understanding of the data. In some tables, the data entries are necessarily nonnegative (apart, perhaps, from some small random noise), and so the matrix factors meant to represent them should arguably also contain only nonnegative elements. This thinking, and the desire for parsimony, underlies such techniques as rotating factors in a search for "simple structure." These attempts to transform score or loading matrices of mixed sign into nonnegative, parsimonious forms are, however, indirect and at best imperfect. The recent development of nonnegative matrix factorization, or NMF, is an attractive alternative. Rather than attempt to transform a loading or score matrix of mixed signs into one with only nonnegative elements, it directly seeks matrix factors containing only nonnegative elements. The resulting factorization often leads to substantial improvements in interpretability of the factors. We illustrate this potential by synthetic examples and a real dataset. The question of exactly when NMF is effective is not fully resolved, but some indicators of its domain of success are given. It is pointed out that the NMF factors can be used in much the same way as those coming from PCA for such tasks as ordination, clustering, and prediction. Supplementary materials for this article are available online.

AB - In statistical practice, rectangular tables of numeric data are commonplace, and are often analyzed using dimensionreduction methods like the singular value decomposition and its close cousin, principal component analysis (PCA). This analysis produces score and loading matrices representing the rows and the columns of the original table and thesematrices may be used for both prediction purposes and to gain structural understanding of the data. In some tables, the data entries are necessarily nonnegative (apart, perhaps, from some small random noise), and so the matrix factors meant to represent them should arguably also contain only nonnegative elements. This thinking, and the desire for parsimony, underlies such techniques as rotating factors in a search for "simple structure." These attempts to transform score or loading matrices of mixed sign into nonnegative, parsimonious forms are, however, indirect and at best imperfect. The recent development of nonnegative matrix factorization, or NMF, is an attractive alternative. Rather than attempt to transform a loading or score matrix of mixed signs into one with only nonnegative elements, it directly seeks matrix factors containing only nonnegative elements. The resulting factorization often leads to substantial improvements in interpretability of the factors. We illustrate this potential by synthetic examples and a real dataset. The question of exactly when NMF is effective is not fully resolved, but some indicators of its domain of success are given. It is pointed out that the NMF factors can be used in much the same way as those coming from PCA for such tasks as ordination, clustering, and prediction. Supplementary materials for this article are available online.

KW - Latent dimensions

KW - Nonnegative matrix factorization

KW - Principal component analysis

KW - Singular value decomposition

UR - http://www.scopus.com/inward/record.url?scp=84901756611&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84901756611&partnerID=8YFLogxK

U2 - 10.1080/00031305.2013.845607

DO - 10.1080/00031305.2013.845607

M3 - Article

AN - SCOPUS:84901756611

VL - 67

SP - 207

EP - 218

JO - American Statistician

JF - American Statistician

SN - 0003-1305

IS - 4

ER -