Abstract
Least median of squares regression presents a daunting computational problem for moderate to large data sets - the optimum is the Chebyshev regression fit to the correct subset of the cases, and finding it exactly requires an investigation of all subsets of the cases of a certain size. The feasible solution algorithm provides a computationally attractive approach that capitalizes on the known form of the LMS solution to pass from one starting subset to another satisfying the necessary condition for the optimum - such subsets are termed 'feasible sets'. The global optimum can be reached with arbitrarily high probability by following enough random starting sets to their feasible solutions. The practical performance of the feasible solution algorithm however depends on the proportion of random starting subsets that are attracted to the global optimum; or (a less demanding requirement) to a feasible solution that is qualitatively the same as the global optimum. Little is currently known about these proportions, or about the structure of the set of feasible solutions. This paper explains the earlier empiric observations that the feasible solution algorithm performs very well, and also provides two refined necessary conditions for improving it further. The approach of k-means clustering of the residuals given by different feasible solution provides new insights into the issue of different but qualitatively equivalent feasible solutions.
Original language | English (US) |
---|---|
Pages (from-to) | 519-538 |
Number of pages | 20 |
Journal | Computational Statistics and Data Analysis |
Volume | 19 |
Issue number | 5 |
DOIs | |
State | Published - May 1995 |
Bibliographical note
Funding Information:This work was supported by the National Science Foundation under grant number DMS 9208819. The author is grateful to Arnold Stromberg for helpful discussions of a number of issues surrounding this area.