TY - JOUR
T1 - Through the Citizen Scientists’ Eyes
T2 - Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data
AU - Mantha, Kameswara Bharadwaj
AU - Krawczyk, Coleman
AU - Roberts, Hayley
AU - Simmons, Brooke
AU - Fortson, Lucy
AU - Walmsley, Mike
AU - Lintott, Chris
AU - Garland, Izzy
AU - Dickinson, Hugh J
AU - Makechemu, Jason Shingirai
AU - Keel, William
AU - Trouille, Laura
AU - Sankar, Ramanakumar
AU - Johnson, Clifford
N1 - Publisher Copyright:
© 2024 The Author(s).
PY - 2024
Y1 - 2024
N2 - In the era of rapidly growing astronomical data, the gap between data collection and analysis is a significant barrier, especially for teams searching for rare scientific objects. Although machine learning (ML) can quickly parse large data sets, it struggles to robustly identify scientifically interesting objects, a task at which humans excel. Human-in-the-loop (HITL) strategies that combine the strengths of citizen science (CS) and ML offer a promising solution, but first, we need to better understand the relationship between human- and machine-identified samples. In this work, we present a case study from the Galaxy Zoo: Weird & Wonderful project, where volunteers inspected ~200,000 astronomical images—processed by an ML-based anomaly detection model—to identify those with unusual or interesting characteristics. Volunteer-selected images with common astrophysical characteristics had higher consensus, while rarer or more complex ones had lower consensus. This suggests low-consensus choices shouldn’t be dismissed in further explorations. Additionally, volunteers were better at filtering out uninteresting anomalies, such as image artifacts, which the machine struggled with. We also found that a higher ML-generated anomaly score that indicates images’ low-level feature anomalousness was a better predictor of the volunteers’ consensus choice. Combining a locus of high volunteer-consensus images within the ML learnt feature space and anomaly score, we demonstrated a decision boundary that can effectively isolate images with unusual and potentially scientifically interesting characteristics. Using this case study, we lay important guidelines for future research studies looking to adapt and operationalize human-machine collaborative frameworks for efficient anomaly detection in big data.
AB - In the era of rapidly growing astronomical data, the gap between data collection and analysis is a significant barrier, especially for teams searching for rare scientific objects. Although machine learning (ML) can quickly parse large data sets, it struggles to robustly identify scientifically interesting objects, a task at which humans excel. Human-in-the-loop (HITL) strategies that combine the strengths of citizen science (CS) and ML offer a promising solution, but first, we need to better understand the relationship between human- and machine-identified samples. In this work, we present a case study from the Galaxy Zoo: Weird & Wonderful project, where volunteers inspected ~200,000 astronomical images—processed by an ML-based anomaly detection model—to identify those with unusual or interesting characteristics. Volunteer-selected images with common astrophysical characteristics had higher consensus, while rarer or more complex ones had lower consensus. This suggests low-consensus choices shouldn’t be dismissed in further explorations. Additionally, volunteers were better at filtering out uninteresting anomalies, such as image artifacts, which the machine struggled with. We also found that a higher ML-generated anomaly score that indicates images’ low-level feature anomalousness was a better predictor of the volunteers’ consensus choice. Combining a locus of high volunteer-consensus images within the ML learnt feature space and anomaly score, we demonstrated a decision boundary that can effectively isolate images with unusual and potentially scientifically interesting characteristics. Using this case study, we lay important guidelines for future research studies looking to adapt and operationalize human-machine collaborative frameworks for efficient anomaly detection in big data.
KW - anomaly detection
KW - astronomy imaging
KW - Deep learning
KW - human-machine optimization
KW - unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85212336388&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85212336388&partnerID=8YFLogxK
U2 - 10.5334/cstp.740
DO - 10.5334/cstp.740
M3 - Article
AN - SCOPUS:85212336388
SN - 2057-4991
VL - 9
JO - Citizen Science: Theory and Practice
JF - Citizen Science: Theory and Practice
IS - 1
M1 - 40
ER -