TY - JOUR

T1 - Sample size requirements for multiple outlier location techniques based on elemental sets

AU - Bradu, Dan

AU - Hawkins, Douglas M.

PY - 1993/9

Y1 - 1993/9

N2 - The identification of multiple outliers in regression data can be attained via robust regression. A class of robust regression techniques relies on investigation of the different elemental regressions - those determined by a minimal set of p observations, where p is the number of linear regression coefficients. Exhaustive enumeration of all such elemental regressions is possible only in small problems. For larger data sets standard practice is to take a sample of N elemental regressions and for each one of the value of a selection statistic is obtained. The minimal value of the statistic points out the elemental regression which is the output of the method. Rousseuw's LMS (Least Median of Squares) selection statistic defines a method which has a maximal breakdown point, is computationally manageable, and so has become popular. The performance of such a technique is usually demonstrated by means of examples, where there is success, but no quantitative evaluation is made. In this paper, the performance of a technique for a data set of known structure is evaluated by calculating the probability of success as a function of N, the number of elemental sets drawn. Success means an output of the procedure which is helpful in locating the outliers. The performance of Rousseeuw's LMS is analyzed in detail for two known data sets.

AB - The identification of multiple outliers in regression data can be attained via robust regression. A class of robust regression techniques relies on investigation of the different elemental regressions - those determined by a minimal set of p observations, where p is the number of linear regression coefficients. Exhaustive enumeration of all such elemental regressions is possible only in small problems. For larger data sets standard practice is to take a sample of N elemental regressions and for each one of the value of a selection statistic is obtained. The minimal value of the statistic points out the elemental regression which is the output of the method. Rousseuw's LMS (Least Median of Squares) selection statistic defines a method which has a maximal breakdown point, is computationally manageable, and so has become popular. The performance of such a technique is usually demonstrated by means of examples, where there is success, but no quantitative evaluation is made. In this paper, the performance of a technique for a data set of known structure is evaluated by calculating the probability of success as a function of N, the number of elemental sets drawn. Success means an output of the procedure which is helpful in locating the outliers. The performance of Rousseeuw's LMS is analyzed in detail for two known data sets.

KW - Elemental set/regression

KW - Least median of squares

KW - Probability of success.

KW - Regression outlier

UR - http://www.scopus.com/inward/record.url?scp=0027844159&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0027844159&partnerID=8YFLogxK

U2 - 10.1016/0167-9473(93)90128-G

DO - 10.1016/0167-9473(93)90128-G

M3 - Article

AN - SCOPUS:0027844159

SN - 0167-9473

VL - 16

SP - 257

EP - 270

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

IS - 3

ER -