Abstract
Given instances (spatial points) of different spatial features (categories), significant spatial co-distribution pattern discovery aims to find subsets of spatial features whose spatial distributions are statistically significantly similar to each other. Discovering significant spatial co-distribution patterns is important for many application domains such as identifying spatial associations between diseases and risk factors in spatial epidemiology. Previous methods mostly associated spatial features whose instances are frequently located together; however, this does not necessarily indicate a similarity in the spatial distributions between different features. Thus, this paper defines the significant spatial co-distribution pattern discovery problem and subsequently develops a novel method to solve it effectively. First, we propose a new measure, dissimilarity index, to quantify the difference between spatial distributions of different features under the spatial neighbor relation and then employ it in a distribution clustering method to detect candidate spatial co-distribution patterns. To further remove spurious patterns that occur accidentally, the validity of each candidate spatial co-distribution pattern is verified through a significance test under the null hypothesis that spatial distributions of different features are independent of each other. To model the null hypothesis, a distribution shift-correction method is presented by randomizing the relationships between different features and maintaining spatial structure of each feature (e.g., spatial auto-correlation). Comparisons with baseline methods using synthetic datasets demonstrate the effectiveness of the proposed method. A case study identifying co-morbidities in central Colorado is also presented to illustrate the real-world applicability of the proposed method.
Original language | English (US) |
---|---|
Article number | 101543 |
Journal | Computers, Environment and Urban Systems |
Volume | 84 |
DOIs | |
State | Published - Nov 2020 |
Bibliographical note
Funding Information:This study was funded through support from the National Natural Science Foundation of China (NSFC) [ 41730105 , 41471385 ]; National Key Research and Development Foundation of China [ 2017YFB0503503 ]; U.S. National Science Foundation (NSF) [ 1737633 , 0940818 , 1029711 , 1541876 , IIS-1218168 , IIS-1320580 ]; Advanced Research Projects Agency - Energy, U.S. Department of Energy [ DE-AR0000795 ]; U.S. Department of Defense [ HM0210-13-1-0005 , HM1582-08-1-0017 ]; U.S. Department of Agriculture [ 2017-51181-27222 ]; U.S. National Institute of Health [ KL2 TR002492 , TL1 TR002493 , UL1 TR002494 ]; OVPR Infrastructure Investment Initiative, University of Minnesota; Minnesota Supercomputing Institute (MSI), University of Minnesota. We would like to thank the reviewers and the members of the spatial computing research group at the University of Minnesota for their helpful comments. We also thank Kim Koffolt for improving the readability of this article.
Funding Information:
This study was funded through support from the National Natural Science Foundation of China (NSFC) [41730105, 41471385]; National Key Research and Development Foundation of China [2017YFB0503503]; U.S. National Science Foundation (NSF) [1737633, 0940818, 1029711, 1541876, IIS-1218168, IIS-1320580]; Advanced Research Projects Agency - Energy, U.S. Department of Energy [DE-AR0000795]; U.S. Department of Defense [HM0210-13-1-0005, HM1582-08-1-0017]; U.S. Department of Agriculture [2017-51181-27222]; U.S. National Institute of Health [KL2 TR002492, TL1 TR002493, UL1 TR002494]; OVPR Infrastructure Investment Initiative, University of Minnesota; Minnesota Supercomputing Institute (MSI), University of Minnesota. We would like to thank the reviewers and the members of the spatial computing research group at the University of Minnesota for their helpful comments. We also thank Kim Koffolt for improving the readability of this article.
Publisher Copyright:
© 2020 Elsevier Ltd
Keywords
- Co-distribution patterns
- Significance test
- Spatial association
- Spatial clustering
- Spatial data mining