High breakdown estimation allows one to get reasonable estimates of the parameters from a sample of data even if that sample is contaminated by large numbers of awkwardly placed outliers. Two particular application areas in which this is of interest are multiple linear regression, and estimation of the location vector and scatter matrix of multivariate data. Standard high breakdown criteria for the regression problem are the least median of squares (LMS) and least trimmed squares (LTS); those for the multivariate location/scatter problem are the minimum volume ellipsoid (MVE) and minimum covariance determinant (MCD). All of these present daunting computational problems. The 'feasible solution algorithms' for these criteria have been shown to have excellent performance for text-book sized problems, but their performance on much larger data sets is less impressive. This paper points out a computationally cheaper feasibility condition for LTS, MVE and MCD, and shows how the combination of the criteria leads to improved performance on large data sets. Algorithms incorporating these improvements are available from the first author's Web site.
Bibliographical noteFunding Information:
The authors are grateful to David Rocke, Arny Stromberg and Carlos Lopez for highlighting some of the problems with the feasible solution algorithms in data sets whose size is in the thousands. The referees made a number of helpful suggestions for improving the article. The work reported here was supported by the National Science Foundation under grant DMS 9505440 and ACI 9619020.
- High breakdown estimation
- Least trimmed squares
- Linear model
- Minimum covariance determinant
- Minimum volume ellipsoid