TY - JOUR
T1 - Penalized Fast Subset Scanning
AU - Speakman, Skyler
AU - Somanchi, Sriram
AU - McFowland, Edward
AU - Neill, Daniel B.
PY - 2016/4/2
Y1 - 2016/4/2
N2 - We present the penalized fast subset scan (PFSS), a new and general framework for scalable and accurate pattern detection. PFSS enables exact and efficient identification of the most anomalous subsets of the data, as measured by a likelihood ratio scan statistic. However, PFSS also allows incorporation of prior information about each data element’s probability of inclusion, which was not previously possible within the subset scan framework. PFSS builds on two main results: first, we prove that a large class of likelihood ratio statistics satisfy a property that allows additional, element-specific penalty terms to be included while maintaining efficient computation. Second, we prove that the penalized statistic can be maximized exactly by evaluating only O(N) subsets. As a concrete example of the PFSS framework, we incorporate “soft” constraints on spatial proximity into the spatial event detection task, enabling more accurate detection of irregularly shaped spatial clusters of varying sparsity. To do so, we develop a distance-based penalty function that rewards spatial compactness and penalizes spatially dispersed clusters. This approach was evaluated on the task of detecting simulated anthrax bio-attacks, using real-world Emergency Department data from a major U.S. city. PFSS demonstrated increased detection power and spatial accuracy as compared to competing methods while maintaining efficient computation.
AB - We present the penalized fast subset scan (PFSS), a new and general framework for scalable and accurate pattern detection. PFSS enables exact and efficient identification of the most anomalous subsets of the data, as measured by a likelihood ratio scan statistic. However, PFSS also allows incorporation of prior information about each data element’s probability of inclusion, which was not previously possible within the subset scan framework. PFSS builds on two main results: first, we prove that a large class of likelihood ratio statistics satisfy a property that allows additional, element-specific penalty terms to be included while maintaining efficient computation. Second, we prove that the penalized statistic can be maximized exactly by evaluating only O(N) subsets. As a concrete example of the PFSS framework, we incorporate “soft” constraints on spatial proximity into the spatial event detection task, enabling more accurate detection of irregularly shaped spatial clusters of varying sparsity. To do so, we develop a distance-based penalty function that rewards spatial compactness and penalizes spatially dispersed clusters. This approach was evaluated on the task of detecting simulated anthrax bio-attacks, using real-world Emergency Department data from a major U.S. city. PFSS demonstrated increased detection power and spatial accuracy as compared to competing methods while maintaining efficient computation.
KW - Disease surveillance
KW - Likelihood ratio statistic
KW - Pattern detection
KW - Scan statistic
UR - http://www.scopus.com/inward/record.url?scp=84971476718&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84971476718&partnerID=8YFLogxK
U2 - 10.1080/10618600.2015.1029578
DO - 10.1080/10618600.2015.1029578
M3 - Article
AN - SCOPUS:84971476718
SN - 1061-8600
VL - 25
SP - 382
EP - 404
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
IS - 2
ER -