Global climate change and its impact on human life has become one of our era's greatest challenges. Despite the urgency, data science has had little impact on furthering our understanding of our planet in spite of the abundance of climate data. This is a stark contrast from other fields such as advertising or electronic commerce where big data has been a great success story. This discrepancy stems from the complex nature of climate data as well as the scientific questions climate science brings forth. This article introduces a data science audience to the challenges and opportunities to mine large climate datasets, with an emphasis on the nuanced difference between mining climate data and traditional big data approaches. We focus on data, methods, and application challenges that must be addressed in order for big data to fulfill their promise with regard to climate science applications. More importantly, we highlight research showing that solely relying on traditional big data techniques results in dubious findings, and we instead propose a theory-guided data science paradigm that uses scientific theory to constrain both the big data techniques as well as the results-interpretation process to extract accurate insight from large climate data.
Bibliographical noteFunding Information:
the postprocessed product available to the public. Such heavy postprocessing will lead to biases in the data, and any analysis must appropriately identify how such biases might manifest in the results. Luckily, the Climate Data Guide16 (http:// climatedataguide.ucar.edu), a project funded by the U.S. National Science Foundation, can be a resource to big data practitioners. The Climate Data Guide serves as a community-authored guide for climate datasets. The guide contains over 100 Earth Science-related data- sets with additional information such as common uses and a list of peer- reviewed publications that used the data.
These ideas were developed while the authors were funded by an NSF Expeditions in Computing Grant #1029711 and an NSF EAGER Grant #1355072. J.H.F. was also funded by an NSF Graduate Research Fellowship and a University of Minnesota Doctoral Dissertation Fellowship.