Applications and algorithms for least trimmed sum of absolute deviations regression

Douglas M Hawkins, David Olive

Research output: Contribution to journalArticlepeer-review

38 Scopus citations

Abstract

High breakdown estimation (HBE) addresses the problem of getting reliable parameter estimates in the face of outliers that may be numerous and badly placed. In multiple regression, the standard HBE's have been those defined by the least median of squares (LMS) and the least trimmed squares (LTS) criteria. Both criteria lead to a partitioning of the data set's n cases into two "halves" - the covered "half" of cases are accommodated by the fit, while the uncovered "half", which is intended to include any outliers, are ignored. In LMS, the criterion is the Chebyshev norm of the residuals of the covered cases, while in LTS the criterion is the sum of squared residuals of the covered cases. Neither LMS nor LTS is entirely satisfactory. LMS has a statistical efficiency of zero if the true residuals are normal, and so is unattractive, particularly for large data sets. LTS is preferable on efficiency grounds, but its exact computation turns out to involve an intolerable computational load in any but quite small data sets. The criterion of least trimmed sum of absolute deviations (LTA) is found by minimizing the sum of absolute residuals of the covered cases. This criterion is not new, but has not been used as widely as we believe it should. We show in this article that LTA is an attractive alternative to LMS and LTS, particularly for large data sets. It has a statistical efficiency that is not much below that of LTS for outlier-free normal data and better than LTS for more peaked error distributions. As its computational complexity is of a lower order than LMS and LTS, it can also be evaluated exactly in much larger samples than either LMS or LTS. Finally, just as its full-sample equivalent, the L1 norm, is robust against outliers on low leverage cases, LTA is able to cover larger subsets than LTS in those data sets where not all outliers are on high leverage cases. For samples too large for exact evaluation of the LTA, we outline a "feasible solution algorithm", which provides excellent approximations to the exact LTA solution using quite modest computation.

Original languageEnglish (US)
Pages (from-to)119-134
Number of pages16
JournalComputational Statistics and Data Analysis
Volume32
Issue number2
DOIs
StatePublished - Dec 28 1999

Keywords

  • High breakdown
  • L1 norm
  • Least median of squares
  • Least trimmed sum of squares
  • Missing values
  • Outliers
  • Robust estimation

Fingerprint Dive into the research topics of 'Applications and algorithms for least trimmed sum of absolute deviations regression'. Together they form a unique fingerprint.

Cite this