Sample size requirements for multiple outlier location techniques based on elemental sets

Dan Bradu, Douglas M. Hawkins

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

The identification of multiple outliers in regression data can be attained via robust regression. A class of robust regression techniques relies on investigation of the different elemental regressions - those determined by a minimal set of p observations, where p is the number of linear regression coefficients. Exhaustive enumeration of all such elemental regressions is possible only in small problems. For larger data sets standard practice is to take a sample of N elemental regressions and for each one of the value of a selection statistic is obtained. The minimal value of the statistic points out the elemental regression which is the output of the method. Rousseuw's LMS (Least Median of Squares) selection statistic defines a method which has a maximal breakdown point, is computationally manageable, and so has become popular. The performance of such a technique is usually demonstrated by means of examples, where there is success, but no quantitative evaluation is made. In this paper, the performance of a technique for a data set of known structure is evaluated by calculating the probability of success as a function of N, the number of elemental sets drawn. Success means an output of the procedure which is helpful in locating the outliers. The performance of Rousseeuw's LMS is analyzed in detail for two known data sets.

Original languageEnglish (US)
Pages (from-to)257-270
Number of pages14
JournalComputational Statistics and Data Analysis
Volume16
Issue number3
DOIs
StatePublished - Sep 1993

Keywords

  • Elemental set/regression
  • Least median of squares
  • Probability of success.
  • Regression outlier

Fingerprint Dive into the research topics of 'Sample size requirements for multiple outlier location techniques based on elemental sets'. Together they form a unique fingerprint.

Cite this