Feature bagging for outlier detection

Aleksandar Lazarevic, Vipin Kumar

Research output: Contribution to conferencePaper

228 Citations (Scopus)

Abstract

Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is proposed. It combines results from multiple outlier detection algorithms that are applied using different set of features. Every outlier detection algorithm uses a small subset of features that are randomly selected from the original feature set. As a result, each outlier detector identifies different outliers, and thus assigns to all data records outlier scores that correspond to their probability of being outliers. The outlier scores computed by the individual outlier detection algorithms are then combined in order to find the better quality outliers. Experiments performed on several synthetic and real life data sets show that the proposed methods for combining outputs from multiple outlier detection algorithms provide non-trivial improvements over the base algorithm.

Original languageEnglish (US)
Pages157-166
Number of pages10
DOIs
StatePublished - Dec 1 2005
EventKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Chicago, IL, United States
Duration: Aug 21 2005Aug 24 2005

Other

OtherKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CityChicago, IL
Period8/21/058/24/05

Fingerprint

Set theory
Detectors
Experiments

Keywords

  • Bagging
  • Detection rate
  • False alarm
  • Feature subsets
  • Integration
  • Outlier detection

Cite this

Lazarevic, A., & Kumar, V. (2005). Feature bagging for outlier detection. 157-166. Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States. https://doi.org/10.1145/1081870.1081891

Feature bagging for outlier detection. / Lazarevic, Aleksandar; Kumar, Vipin.

2005. 157-166 Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States.

Research output: Contribution to conferencePaper

Lazarevic, A & Kumar, V 2005, 'Feature bagging for outlier detection' Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States, 8/21/05 - 8/24/05, pp. 157-166. https://doi.org/10.1145/1081870.1081891
Lazarevic A, Kumar V. Feature bagging for outlier detection. 2005. Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States. https://doi.org/10.1145/1081870.1081891
Lazarevic, Aleksandar ; Kumar, Vipin. / Feature bagging for outlier detection. Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States.10 p.
@conference{fd13b2b895ed4ef8bc3f76c4e3872e88,
title = "Feature bagging for outlier detection",
abstract = "Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is proposed. It combines results from multiple outlier detection algorithms that are applied using different set of features. Every outlier detection algorithm uses a small subset of features that are randomly selected from the original feature set. As a result, each outlier detector identifies different outliers, and thus assigns to all data records outlier scores that correspond to their probability of being outliers. The outlier scores computed by the individual outlier detection algorithms are then combined in order to find the better quality outliers. Experiments performed on several synthetic and real life data sets show that the proposed methods for combining outputs from multiple outlier detection algorithms provide non-trivial improvements over the base algorithm.",
keywords = "Bagging, Detection rate, False alarm, Feature subsets, Integration, Outlier detection",
author = "Aleksandar Lazarevic and Vipin Kumar",
year = "2005",
month = "12",
day = "1",
doi = "10.1145/1081870.1081891",
language = "English (US)",
pages = "157--166",
note = "KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ; Conference date: 21-08-2005 Through 24-08-2005",

}

TY - CONF

T1 - Feature bagging for outlier detection

AU - Lazarevic, Aleksandar

AU - Kumar, Vipin

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is proposed. It combines results from multiple outlier detection algorithms that are applied using different set of features. Every outlier detection algorithm uses a small subset of features that are randomly selected from the original feature set. As a result, each outlier detector identifies different outliers, and thus assigns to all data records outlier scores that correspond to their probability of being outliers. The outlier scores computed by the individual outlier detection algorithms are then combined in order to find the better quality outliers. Experiments performed on several synthetic and real life data sets show that the proposed methods for combining outputs from multiple outlier detection algorithms provide non-trivial improvements over the base algorithm.

AB - Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is proposed. It combines results from multiple outlier detection algorithms that are applied using different set of features. Every outlier detection algorithm uses a small subset of features that are randomly selected from the original feature set. As a result, each outlier detector identifies different outliers, and thus assigns to all data records outlier scores that correspond to their probability of being outliers. The outlier scores computed by the individual outlier detection algorithms are then combined in order to find the better quality outliers. Experiments performed on several synthetic and real life data sets show that the proposed methods for combining outputs from multiple outlier detection algorithms provide non-trivial improvements over the base algorithm.

KW - Bagging

KW - Detection rate

KW - False alarm

KW - Feature subsets

KW - Integration

KW - Outlier detection

UR - http://www.scopus.com/inward/record.url?scp=32344440279&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32344440279&partnerID=8YFLogxK

U2 - 10.1145/1081870.1081891

DO - 10.1145/1081870.1081891

M3 - Paper

SP - 157

EP - 166

ER -