Abstract
One of the central challenges of data analysis in climate science is understanding complex dependencies between multiple spatiotemporal climate variables. The data are typically high dimensional with each climate variable in each spatial grid or time period denoting a separate dimension. In fact, in many climate problems, the dimensionality, that is, the number of possible features or factors potentially affecting a response variable, is usually much larger than the number of samples that are typically reanalysis data sets over the past few decades. For example, in one of the problems considered in this chapter, one wants to predict climate variables like monthly temperature, precipitable water, etc. over land locations using information from six climate variables over oceans. We formulate it as a regression problem with the climate variable over a land location as the response variable. We consider 439 locations on oceans, so that there are a total of 6 × 439 = 2634 covariates in our regression problem. The data are the monthly means of the climate variables for 1948-2007 from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) Reanalysis 1 data set [1], so that we have a total of 60 × 12 = 720 data samples.
Original language | English (US) |
---|---|
Title of host publication | Large-Scale Machine Learning in the Earth Sciences |
Publisher | CRC Press |
Pages | 13-32 |
Number of pages | 20 |
ISBN (Electronic) | 9781498703888 |
ISBN (Print) | 9781498703871 |
DOIs | |
State | Published - Jan 1 2017 |
Bibliographical note
Publisher Copyright:© 2017 by Taylor & Francis Group, LLC.