Abstract
An optimization criterion is presented for discriminant analysis. The criterion extends the optimization criteria of the classical Linear Discriminant Analysis (LDA) through the use of the pseudoinverse when the scatter matrices are singular. It is applicable regardless of the relative sizes of the data dimension and sample size, overcoming a limitation of classical LDA. The optimization problem can be solved analytically by applying the Generalized Singular Value Decomposition (GSVD) technique. The pseudoinverse has been suggested and used for undersampled problems in the past, where the data dimension exceeds the number of data points. The criterion proposed in this paper provides a theoretical justification for this procedure. An approximation algorithm for the GSVD-based approach is also presented. It reduces the computational complexity by finding subclusters of each cluster and uses their centroids to capture the structure of each cluster. This reduced problem yields much smaller matrices to which the GSVD can be applied efficiently. Experiments on text data, with up to 7,000 dimensions, show that the approximation algorithm produces results that are close to those produced by the exact algorithm.
Original language | English (US) |
---|---|
Pages (from-to) | 982-994 |
Number of pages | 13 |
Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
Volume | 26 |
Issue number | 8 |
DOIs | |
State | Published - Aug 2004 |
Bibliographical note
Funding Information:The authors would like to thank the associate editor and the four reviewers for helpful comments that greatly improved the paper. Research of J. Ye and R. Janardan was sponsored, in part, by the US Army High-Performance Computing Research Center under the auspices of the US Department of the Army, US Army Research Laboratory cooperative agreement number DAAD19-01-2-0014, the content of which does not necessarily reflect the position or the policy of the US government, and no official endorsement should be inferred. Research of C.H. Park and H. Park has been supported, in part, by the US National Science Foundation grants CCR-0204109 and ACI-0305543. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the US National Science Foundation.