Significant DBSCan towards statistically robust clustering

Yiqun Xie, Shashi Shekhar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Given a collection of geo-distributed points, we aim to detect statistically significant clusters of varying shapes and densities. Spatial clustering has been widely used many important societal applications, including public health and safety, transportation, environment, etc. The problem is challenging because many application domains have low-tolerance to false positives (e.g., falsely claiming a crime cluster in a community can have serious negative impacts on the residents) and clusters often have irregular shapes. In related work, the spatial scan statistic is a popular technique that can detect significant clusters but it requires clusters to have certain predefined shapes (e.g., circles, rings). In contrast, density-based methods (e.g., DBSCAN) can find clusters of arbitrary shape efficiently but do not consider statistical significance, making them susceptible to spurious patterns. To address these limitations, we first propose a modeling of statistical significance in DBSCAN based clustering. Then, we propose a baseline Monte Carlo method to estimate the significance of clusters and a Dual-Convergence algorithm to accelerate the computation. Experiment results show that significant DBSCAN is very effective in removing chance patterns and the Dual-Convergence algorithm can greatly reduce execution time.

Original languageEnglish (US)
Title of host publicationProceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD 2019
PublisherAssociation for Computing Machinery
Pages31-40
Number of pages10
ISBN (Electronic)9781450362801
DOIs
StatePublished - Aug 19 2019
Event16th International Symposium on Spatial and Temporal Databases, SSTD 2019 - Vienna, Austria
Duration: Aug 19 2019Aug 21 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference16th International Symposium on Spatial and Temporal Databases, SSTD 2019
CountryAustria
CityVienna
Period8/19/198/21/19

Bibliographical note

Funding Information:
This work is supported by the US NSF under Grants No. 1737633, 1029711, IIS-1320580, 0940818 and IIS-1218168, the USDOD under Grants HM0210-13-1-0005, ARPA-E under Grant No. DE-AR0000795, USDA under Grant No. 2017-51181-27222, NIH under Grant No. UL1 TR002494, KL2 TR002492 and TL1 TR002493 and the OVPR U-Spatial and MSI at the U. of Minnesota. We also thank Dr. Hans-Peter Kriegel for his encouragement on carrying out this research.

Keywords

  • Clustering
  • DBSCAN
  • Robust
  • Spatial
  • Statistical significance

Fingerprint Dive into the research topics of 'Significant DBSCan towards statistically robust clustering'. Together they form a unique fingerprint.

Cite this