High-breakdown linear discriminant analysis

Douglas M. Hawkins, Geoffrey J. McLachlan

Research output: Contribution to journalArticlepeer-review

84 Scopus citations

Abstract

The classification rules of linear discriminant analysis are defined by the true mean vectors and the common covariance matrix of the populations from which the data come. Because these true parameters are generally unknown, they are commonly estimated by the sample mean vector and covariance matrix of the data in a training sample randomly drawn from each population. However, these sample statistics are notoriously susceptible to contamination by outliers, a problem compounded by the fact that the outliers may be invisible to conventional diagnostics. High-breakdown estimation is a procedure designed to remove this cause for concern by producing estimates that are immune to serious distortion by a minority of outliers, regardless of their severity. In this article we motivate and develop a high-breakdown criterion for linear discriminant analysis and give an algorithm for its implementation. The procedure is intended to supplement rather than replace the usual sample-moment methodology of discriminant analysis either by providing indications that the dataset is not seriously affected by outliers (supporting the usual analysis) or by identifying apparently aberrant points and giving resistant estimators that are not affected by them.

Original languageEnglish (US)
Pages (from-to)136-143
Number of pages8
JournalJournal of the American Statistical Association
Volume92
Issue number437
DOIs
StatePublished - Mar 1997

Keywords

  • Classification rules
  • Minimum covariance determinant
  • Outliers

Fingerprint

Dive into the research topics of 'High-breakdown linear discriminant analysis'. Together they form a unique fingerprint.

Cite this