Structured estimation in high dimensions: Applications in climate

André R. Goncalves, Arindam Banerjee, Vidyashankar Sivakumar, Soumyadeep Chatterjee

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Scopus citations


One of the central challenges of data analysis in climate science is understanding complex dependencies between multiple spatiotemporal climate variables. The data are typically high dimensional with each climate variable in each spatial grid or time period denoting a separate dimension. In fact, in many climate problems, the dimensionality, that is, the number of possible features or factors potentially affecting a response variable, is usually much larger than the number of samples that are typically reanalysis data sets over the past few decades. For example, in one of the problems considered in this chapter, one wants to predict climate variables like monthly temperature, precipitable water, etc. over land locations using information from six climate variables over oceans. We formulate it as a regression problem with the climate variable over a land location as the response variable. We consider 439 locations on oceans, so that there are a total of 6 × 439 = 2634 covariates in our regression problem. The data are the monthly means of the climate variables for 1948-2007 from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) Reanalysis 1 data set [1], so that we have a total of 60 × 12 = 720 data samples.

Original languageEnglish (US)
Title of host publicationLarge-Scale Machine Learning in the Earth Sciences
PublisherCRC Press
Number of pages20
ISBN (Electronic)9781498703888
ISBN (Print)9781498703871
StatePublished - Jan 1 2017

Bibliographical note

Publisher Copyright:
© 2017 by Taylor & Francis Group, LLC.


Dive into the research topics of 'Structured estimation in high dimensions: Applications in climate'. Together they form a unique fingerprint.

Cite this