TY - JOUR
T1 - Sample size requirements for multiple outlier location techniques based on elemental sets
AU - Bradu, Dan
AU - Hawkins, Douglas M.
PY - 1993/9
Y1 - 1993/9
N2 - The identification of multiple outliers in regression data can be attained via robust regression. A class of robust regression techniques relies on investigation of the different elemental regressions - those determined by a minimal set of p observations, where p is the number of linear regression coefficients. Exhaustive enumeration of all such elemental regressions is possible only in small problems. For larger data sets standard practice is to take a sample of N elemental regressions and for each one of the value of a selection statistic is obtained. The minimal value of the statistic points out the elemental regression which is the output of the method. Rousseuw's LMS (Least Median of Squares) selection statistic defines a method which has a maximal breakdown point, is computationally manageable, and so has become popular. The performance of such a technique is usually demonstrated by means of examples, where there is success, but no quantitative evaluation is made. In this paper, the performance of a technique for a data set of known structure is evaluated by calculating the probability of success as a function of N, the number of elemental sets drawn. Success means an output of the procedure which is helpful in locating the outliers. The performance of Rousseeuw's LMS is analyzed in detail for two known data sets.
AB - The identification of multiple outliers in regression data can be attained via robust regression. A class of robust regression techniques relies on investigation of the different elemental regressions - those determined by a minimal set of p observations, where p is the number of linear regression coefficients. Exhaustive enumeration of all such elemental regressions is possible only in small problems. For larger data sets standard practice is to take a sample of N elemental regressions and for each one of the value of a selection statistic is obtained. The minimal value of the statistic points out the elemental regression which is the output of the method. Rousseuw's LMS (Least Median of Squares) selection statistic defines a method which has a maximal breakdown point, is computationally manageable, and so has become popular. The performance of such a technique is usually demonstrated by means of examples, where there is success, but no quantitative evaluation is made. In this paper, the performance of a technique for a data set of known structure is evaluated by calculating the probability of success as a function of N, the number of elemental sets drawn. Success means an output of the procedure which is helpful in locating the outliers. The performance of Rousseeuw's LMS is analyzed in detail for two known data sets.
KW - Elemental set/regression
KW - Least median of squares
KW - Probability of success.
KW - Regression outlier
UR - http://www.scopus.com/inward/record.url?scp=0027844159&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0027844159&partnerID=8YFLogxK
U2 - 10.1016/0167-9473(93)90128-G
DO - 10.1016/0167-9473(93)90128-G
M3 - Article
AN - SCOPUS:0027844159
SN - 0167-9473
VL - 16
SP - 257
EP - 270
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
IS - 3
ER -