TY - JOUR
T1 - High-breakdown linear discriminant analysis
AU - Hawkins, Douglas M.
AU - McLachlan, Geoffrey J.
PY - 1997/3
Y1 - 1997/3
N2 - The classification rules of linear discriminant analysis are defined by the true mean vectors and the common covariance matrix of the populations from which the data come. Because these true parameters are generally unknown, they are commonly estimated by the sample mean vector and covariance matrix of the data in a training sample randomly drawn from each population. However, these sample statistics are notoriously susceptible to contamination by outliers, a problem compounded by the fact that the outliers may be invisible to conventional diagnostics. High-breakdown estimation is a procedure designed to remove this cause for concern by producing estimates that are immune to serious distortion by a minority of outliers, regardless of their severity. In this article we motivate and develop a high-breakdown criterion for linear discriminant analysis and give an algorithm for its implementation. The procedure is intended to supplement rather than replace the usual sample-moment methodology of discriminant analysis either by providing indications that the dataset is not seriously affected by outliers (supporting the usual analysis) or by identifying apparently aberrant points and giving resistant estimators that are not affected by them.
AB - The classification rules of linear discriminant analysis are defined by the true mean vectors and the common covariance matrix of the populations from which the data come. Because these true parameters are generally unknown, they are commonly estimated by the sample mean vector and covariance matrix of the data in a training sample randomly drawn from each population. However, these sample statistics are notoriously susceptible to contamination by outliers, a problem compounded by the fact that the outliers may be invisible to conventional diagnostics. High-breakdown estimation is a procedure designed to remove this cause for concern by producing estimates that are immune to serious distortion by a minority of outliers, regardless of their severity. In this article we motivate and develop a high-breakdown criterion for linear discriminant analysis and give an algorithm for its implementation. The procedure is intended to supplement rather than replace the usual sample-moment methodology of discriminant analysis either by providing indications that the dataset is not seriously affected by outliers (supporting the usual analysis) or by identifying apparently aberrant points and giving resistant estimators that are not affected by them.
KW - Classification rules
KW - Minimum covariance determinant
KW - Outliers
UR - http://www.scopus.com/inward/record.url?scp=0031499747&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0031499747&partnerID=8YFLogxK
U2 - 10.1080/01621459.1997.10473610
DO - 10.1080/01621459.1997.10473610
M3 - Article
AN - SCOPUS:0031499747
SN - 0162-1459
VL - 92
SP - 136
EP - 143
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 437
ER -