Because high-breakdown estimators (HBEs) are impractical to compute exactly in large samples, approximate algorithms are used. The algorithm generally produces an estimator with a lower consistency rate and breakdown value than the exact theoretical estimator. This discrepancy grows with the sample size, with the implication that huge computations are needed for good approximations in large high-dimension samples. The workhorse for HBEs has been the "elemental set," or "basic resampling" algorithm. This turns out to be completely ineffective in high dimensions with high levels of contamination. However, enriching it with a "concentration" step turns it into a method that can handle even high levels of contamination, provided that the regression outliers are located on random cases. It remains ineffective if the regression outliers are concentrated on high-leverage cases. We focus on the multiple regression problem, but several of the broad conclusions-notably, those of the inadequacy of fixed numbers of elemental starts-are also relevant to multivariate location and dispersion estimation. We introduce a new algorithm-the "X-cluster" method-for large high-dimensional multiple regression datasets that are beyond the reach of standard resampling methods. This algorithm departs sharply from current HBE algorithms in that, even at a constant percentage of contamination, it is more effective the larger the sample, making a compelling case for using it in the large-sample situations that current methods serve poorly. A multipronged analysis using both traditional ordinary least squares and L1 methods along with newer resistant techniques will often detect departures from the multiple regression model that cannot be detected by any single estimator.
Bibliographical noteFunding Information:
Douglas M. Hawkins is Professor, School of Statistics, Univey ofrMin-sit nesota, Minneapolis, MN 55455 (E-mail: firstname.lastname@example.org)D.vJ.aiOlived is Assistant Profe, DspartmentseoofrMathemSoauttherniIclnlsiiUos,iv-ner sit, yCarbondale, IL 62901. The authors’ are grteafl uto the editors and ref-er ees for a number of helpful suggestions for improvement in the aricle.t Their work was supported by National Science Foundation grants DMS 9803622 and AC I 9619020.
Ricardo A. Marnona isPfessorroDpaerntotadeMmteaiFtceaam,cuál-tad de Ciencias EaxscU,ntiveardsNacioiadnal de La Plata, C.C. 172, 1900 La Plata, Agentinarand a reaserr aceCtIChPAVíBctor J. Yoai his Prof-es sor, Deparamt to deMneatemticas,áFtadadceCucialiseEnxyNacattaus-rales, Univead dre Buensios dAires, Ciudad Univer1428sBuenitosaAirresi,a, Argentina and a rseeaat CrONcICh. YEoehai rwasTpartially supported by grantsX083fromtheUniveyofrBusenoisAires,t PIP41869/fro6mCON-ICE, andTP IC-99T03–06277 from ANPCyT.
StephenPortnoyisProfser,sDepartmeno tofStatistics,UniveofrIls-liity nois at Urbana-Chapaigmn, IL (E-mail: email@example.com). Rrecpar-shea tially supported by NtioinalaScience Fundatioon grants DMS9703758 and DMS0102411.
- Elemental set
- Least median of squares
- Least trimmed absolute deviations
- Least trimmed squares
- Minimum covariance determinant