Significant DBSCan towards statistically robust clustering

Yiqun Xie, Shashi Shekhar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Given a collection of geo-distributed points, we aim to detect statistically significant clusters of varying shapes and densities. Spatial clustering has been widely used many important societal applications, including public health and safety, transportation, environment, etc. The problem is challenging because many application domains have low-tolerance to false positives (e.g., falsely claiming a crime cluster in a community can have serious negative impacts on the residents) and clusters often have irregular shapes. In related work, the spatial scan statistic is a popular technique that can detect significant clusters but it requires clusters to have certain predefined shapes (e.g., circles, rings). In contrast, density-based methods (e.g., DBSCAN) can find clusters of arbitrary shape efficiently but do not consider statistical significance, making them susceptible to spurious patterns. To address these limitations, we first propose a modeling of statistical significance in DBSCAN based clustering. Then, we propose a baseline Monte Carlo method to estimate the significance of clusters and a Dual-Convergence algorithm to accelerate the computation. Experiment results show that significant DBSCAN is very effective in removing chance patterns and the Dual-Convergence algorithm can greatly reduce execution time.

Original languageEnglish (US)
Title of host publicationProceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD 2019
PublisherAssociation for Computing Machinery
Pages31-40
Number of pages10
ISBN (Electronic)9781450362801
DOIs
StatePublished - Aug 19 2019
Event16th International Symposium on Spatial and Temporal Databases, SSTD 2019 - Vienna, Austria
Duration: Aug 19 2019Aug 21 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference16th International Symposium on Spatial and Temporal Databases, SSTD 2019
CountryAustria
CityVienna
Period8/19/198/21/19

    Fingerprint

Keywords

  • Clustering
  • DBSCAN
  • Robust
  • Spatial
  • Statistical significance

Cite this

Xie, Y., & Shekhar, S. (2019). Significant DBSCan towards statistically robust clustering. In Proceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD 2019 (pp. 31-40). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3340964.3340968