TY - JOUR

T1 - Applications and algorithms for least trimmed sum of absolute deviations regression

AU - Hawkins, Douglas M

AU - Olive, David

PY - 1999/12/28

Y1 - 1999/12/28

N2 - High breakdown estimation (HBE) addresses the problem of getting reliable parameter estimates in the face of outliers that may be numerous and badly placed. In multiple regression, the standard HBE's have been those defined by the least median of squares (LMS) and the least trimmed squares (LTS) criteria. Both criteria lead to a partitioning of the data set's n cases into two "halves" - the covered "half" of cases are accommodated by the fit, while the uncovered "half", which is intended to include any outliers, are ignored. In LMS, the criterion is the Chebyshev norm of the residuals of the covered cases, while in LTS the criterion is the sum of squared residuals of the covered cases. Neither LMS nor LTS is entirely satisfactory. LMS has a statistical efficiency of zero if the true residuals are normal, and so is unattractive, particularly for large data sets. LTS is preferable on efficiency grounds, but its exact computation turns out to involve an intolerable computational load in any but quite small data sets. The criterion of least trimmed sum of absolute deviations (LTA) is found by minimizing the sum of absolute residuals of the covered cases. This criterion is not new, but has not been used as widely as we believe it should. We show in this article that LTA is an attractive alternative to LMS and LTS, particularly for large data sets. It has a statistical efficiency that is not much below that of LTS for outlier-free normal data and better than LTS for more peaked error distributions. As its computational complexity is of a lower order than LMS and LTS, it can also be evaluated exactly in much larger samples than either LMS or LTS. Finally, just as its full-sample equivalent, the L1 norm, is robust against outliers on low leverage cases, LTA is able to cover larger subsets than LTS in those data sets where not all outliers are on high leverage cases. For samples too large for exact evaluation of the LTA, we outline a "feasible solution algorithm", which provides excellent approximations to the exact LTA solution using quite modest computation.

AB - High breakdown estimation (HBE) addresses the problem of getting reliable parameter estimates in the face of outliers that may be numerous and badly placed. In multiple regression, the standard HBE's have been those defined by the least median of squares (LMS) and the least trimmed squares (LTS) criteria. Both criteria lead to a partitioning of the data set's n cases into two "halves" - the covered "half" of cases are accommodated by the fit, while the uncovered "half", which is intended to include any outliers, are ignored. In LMS, the criterion is the Chebyshev norm of the residuals of the covered cases, while in LTS the criterion is the sum of squared residuals of the covered cases. Neither LMS nor LTS is entirely satisfactory. LMS has a statistical efficiency of zero if the true residuals are normal, and so is unattractive, particularly for large data sets. LTS is preferable on efficiency grounds, but its exact computation turns out to involve an intolerable computational load in any but quite small data sets. The criterion of least trimmed sum of absolute deviations (LTA) is found by minimizing the sum of absolute residuals of the covered cases. This criterion is not new, but has not been used as widely as we believe it should. We show in this article that LTA is an attractive alternative to LMS and LTS, particularly for large data sets. It has a statistical efficiency that is not much below that of LTS for outlier-free normal data and better than LTS for more peaked error distributions. As its computational complexity is of a lower order than LMS and LTS, it can also be evaluated exactly in much larger samples than either LMS or LTS. Finally, just as its full-sample equivalent, the L1 norm, is robust against outliers on low leverage cases, LTA is able to cover larger subsets than LTS in those data sets where not all outliers are on high leverage cases. For samples too large for exact evaluation of the LTA, we outline a "feasible solution algorithm", which provides excellent approximations to the exact LTA solution using quite modest computation.

KW - High breakdown

KW - L1 norm

KW - Least median of squares

KW - Least trimmed sum of squares

KW - Missing values

KW - Outliers

KW - Robust estimation

UR - http://www.scopus.com/inward/record.url?scp=0038334682&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0038334682&partnerID=8YFLogxK

U2 - 10.1016/S0167-9473(99)00029-8

DO - 10.1016/S0167-9473(99)00029-8

M3 - Article

AN - SCOPUS:0038334682

SN - 0167-9473

VL - 32

SP - 119

EP - 134

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

IS - 2

ER -