TY - JOUR

T1 - Convergence of the feasible solution algorithm for least median of squares regression

AU - Hawkins, Douglas M.

PY - 1995/5

Y1 - 1995/5

N2 - Least median of squares regression presents a daunting computational problem for moderate to large data sets - the optimum is the Chebyshev regression fit to the correct subset of the cases, and finding it exactly requires an investigation of all subsets of the cases of a certain size. The feasible solution algorithm provides a computationally attractive approach that capitalizes on the known form of the LMS solution to pass from one starting subset to another satisfying the necessary condition for the optimum - such subsets are termed 'feasible sets'. The global optimum can be reached with arbitrarily high probability by following enough random starting sets to their feasible solutions. The practical performance of the feasible solution algorithm however depends on the proportion of random starting subsets that are attracted to the global optimum; or (a less demanding requirement) to a feasible solution that is qualitatively the same as the global optimum. Little is currently known about these proportions, or about the structure of the set of feasible solutions. This paper explains the earlier empiric observations that the feasible solution algorithm performs very well, and also provides two refined necessary conditions for improving it further. The approach of k-means clustering of the residuals given by different feasible solution provides new insights into the issue of different but qualitatively equivalent feasible solutions.

AB - Least median of squares regression presents a daunting computational problem for moderate to large data sets - the optimum is the Chebyshev regression fit to the correct subset of the cases, and finding it exactly requires an investigation of all subsets of the cases of a certain size. The feasible solution algorithm provides a computationally attractive approach that capitalizes on the known form of the LMS solution to pass from one starting subset to another satisfying the necessary condition for the optimum - such subsets are termed 'feasible sets'. The global optimum can be reached with arbitrarily high probability by following enough random starting sets to their feasible solutions. The practical performance of the feasible solution algorithm however depends on the proportion of random starting subsets that are attracted to the global optimum; or (a less demanding requirement) to a feasible solution that is qualitatively the same as the global optimum. Little is currently known about these proportions, or about the structure of the set of feasible solutions. This paper explains the earlier empiric observations that the feasible solution algorithm performs very well, and also provides two refined necessary conditions for improving it further. The approach of k-means clustering of the residuals given by different feasible solution provides new insights into the issue of different but qualitatively equivalent feasible solutions.

UR - http://www.scopus.com/inward/record.url?scp=0010558533&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0010558533&partnerID=8YFLogxK

U2 - 10.1016/0167-9473(94)00017-D

DO - 10.1016/0167-9473(94)00017-D

M3 - Article

AN - SCOPUS:0010558533

SN - 0167-9473

VL - 19

SP - 519

EP - 538

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

IS - 5

ER -