Abstract
Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is proposed. It combines results from multiple outlier detection algorithms that are applied using different set of features. Every outlier detection algorithm uses a small subset of features that are randomly selected from the original feature set. As a result, each outlier detector identifies different outliers, and thus assigns to all data records outlier scores that correspond to their probability of being outliers. The outlier scores computed by the individual outlier detection algorithms are then combined in order to find the better quality outliers. Experiments performed on several synthetic and real life data sets show that the proposed methods for combining outputs from multiple outlier detection algorithms provide non-trivial improvements over the base algorithm.
Original language | English (US) |
---|---|
Pages | 157-166 |
Number of pages | 10 |
DOIs | |
State | Published - 2005 |
Event | KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Chicago, IL, United States Duration: Aug 21 2005 → Aug 24 2005 |
Other
Other | KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
---|---|
Country/Territory | United States |
City | Chicago, IL |
Period | 8/21/05 → 8/24/05 |
Keywords
- Bagging
- Detection rate
- False alarm
- Feature subsets
- Integration
- Outlier detection