Global climate change and its impact on human life has become one of our era's greatest challenges. Despite the urgency, data science has had little impact on furthering our understanding of our planet in spite of the abundance of climate data. This is a stark contrast from other fields such as advertising or electronic commerce where big data has been a great success story. This discrepancy stems from the complex nature of climate data as well as the scientific questions climate science brings forth. This article introduces a data science audience to the challenges and opportunities to mine large climate datasets, with an emphasis on the nuanced difference between mining climate data and traditional big data approaches. We focus on data, methods, and application challenges that must be addressed in order for big data to fulfill their promise with regard to climate science applications. More importantly, we highlight research showing that solely relying on traditional big data techniques results in dubious findings, and we instead propose a theory-guided data science paradigm that uses scientific theory to constrain both the big data techniques as well as the results-interpretation process to extract accurate insight from large climate data.
Bibliographical noteFunding Information:
These ideas were developed while the authors were funded by an NSF Expeditions in Computing Grant #1029711 and an NSF EAGER Grant #1355072.
These ideas were developed while the authors were funded by an NSF Expeditions in Computing Grant #1029711 and an NSF EAGER Grant #1355072. J.H.F. was also funded by an NSF Graduate Research Fellowship and a University of Minnesota Doctoral Dissertation Fellowship.
© Copyright 2014, Mary Ann Liebert, Inc.