TY - CHAP
T1 - Structured estimation in high dimensions
T2 - Applications in climate
AU - Goncalves, André R.
AU - Banerjee, Arindam
AU - Sivakumar, Vidyashankar
AU - Chatterjee, Soumyadeep
N1 - Publisher Copyright:
© 2017 by Taylor & Francis Group, LLC.
Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2017/1/1
Y1 - 2017/1/1
N2 - One of the central challenges of data analysis in climate science is understanding complex dependencies between multiple spatiotemporal climate variables. The data are typically high dimensional with each climate variable in each spatial grid or time period denoting a separate dimension. In fact, in many climate problems, the dimensionality, that is, the number of possible features or factors potentially affecting a response variable, is usually much larger than the number of samples that are typically reanalysis data sets over the past few decades. For example, in one of the problems considered in this chapter, one wants to predict climate variables like monthly temperature, precipitable water, etc. over land locations using information from six climate variables over oceans. We formulate it as a regression problem with the climate variable over a land location as the response variable. We consider 439 locations on oceans, so that there are a total of 6 × 439 = 2634 covariates in our regression problem. The data are the monthly means of the climate variables for 1948-2007 from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) Reanalysis 1 data set [1], so that we have a total of 60 × 12 = 720 data samples. Traditional statistical methods like least squares regression do not work in such high-dimensional, low-sample scenarios
AB - One of the central challenges of data analysis in climate science is understanding complex dependencies between multiple spatiotemporal climate variables. The data are typically high dimensional with each climate variable in each spatial grid or time period denoting a separate dimension. In fact, in many climate problems, the dimensionality, that is, the number of possible features or factors potentially affecting a response variable, is usually much larger than the number of samples that are typically reanalysis data sets over the past few decades. For example, in one of the problems considered in this chapter, one wants to predict climate variables like monthly temperature, precipitable water, etc. over land locations using information from six climate variables over oceans. We formulate it as a regression problem with the climate variable over a land location as the response variable. We consider 439 locations on oceans, so that there are a total of 6 × 439 = 2634 covariates in our regression problem. The data are the monthly means of the climate variables for 1948-2007 from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) Reanalysis 1 data set [1], so that we have a total of 60 × 12 = 720 data samples. Traditional statistical methods like least squares regression do not work in such high-dimensional, low-sample scenarios
UR - http://www.scopus.com/inward/record.url?scp=85051788997&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051788997&partnerID=8YFLogxK
U2 - 10.4324/9781315371740
DO - 10.4324/9781315371740
M3 - Chapter
AN - SCOPUS:85051788997
SN - 9781498703871
SP - 13
EP - 32
BT - Large-Scale Machine Learning in the Earth Sciences
PB - CRC Press
ER -