Anomaly construction in climate data: Issues and challenges

Research output: Contribution to conferencePaper

9 Citations (Scopus)

Abstract

Earth science data consists of a strong seasonality component as indicated by the cycles of repeated patterns in climate variables such as air pressure, temperature and precipitation. The seasonality forms the strongest signals in this data and in order to find other patterns, the seasonality is removed by subtracting the monthly mean values of the raw data for each month. However since the raw data like air temperature, pressure, etc. are constantly being generated with the help of satellite observations, the climate scientists usually use a moving reference base interval of some years of raw data to calculate the mean in order to generate the anomaly time series and study the changes with respect to that. In this paper, we evaluate different measures for base computation and show how an arbitrary choice of base can skew the results and lead to a favorable outcome which might not necessarily be true. We perform a detailed study of different base selection criterion and base periods to highlight that the outcome of data mining can be sensitive to choice of the base. We present a case study of the dipole in the Sahel region to highlight the bias creeping into the results due to the choice of the base. Finally, we propose a generalized model for base selection which uses Monte-Carlo based methods to minimize the expected variance in the anomaly time-series of the underlying datasets. Our research can be instructive for climate scientists and researchers in temporal domain to enable them to choose the right base which would not bias the outcome of the results.

Original languageEnglish (US)
Pages189-203
Number of pages15
StatePublished - Dec 1 2011
EventNASA Conference on Intelligent Data Understanding, CIDU 2011 - Mountain View, CA, United States
Duration: Oct 19 2011Oct 21 2011

Other

OtherNASA Conference on Intelligent Data Understanding, CIDU 2011
CountryUnited States
CityMountain View, CA
Period10/19/1110/21/11

Fingerprint

Time series
Earth sciences
Time and motion study
Air
Data mining
Satellites
Temperature

Cite this

Kawale, J., Chatterjee, S. B., Kumar, A., Liess, S., Steinbach, M. S., & Kumar, V. (2011). Anomaly construction in climate data: Issues and challenges. 189-203. Paper presented at NASA Conference on Intelligent Data Understanding, CIDU 2011, Mountain View, CA, United States.

Anomaly construction in climate data : Issues and challenges. / Kawale, Jaya; Chatterjee, Singdhansu B; Kumar, Arjun; Liess, Stefan; Steinbach, Michael S; Kumar, Vipin.

2011. 189-203 Paper presented at NASA Conference on Intelligent Data Understanding, CIDU 2011, Mountain View, CA, United States.

Research output: Contribution to conferencePaper

Kawale, J, Chatterjee, SB, Kumar, A, Liess, S, Steinbach, MS & Kumar, V 2011, 'Anomaly construction in climate data: Issues and challenges' Paper presented at NASA Conference on Intelligent Data Understanding, CIDU 2011, Mountain View, CA, United States, 10/19/11 - 10/21/11, pp. 189-203.
Kawale J, Chatterjee SB, Kumar A, Liess S, Steinbach MS, Kumar V. Anomaly construction in climate data: Issues and challenges. 2011. Paper presented at NASA Conference on Intelligent Data Understanding, CIDU 2011, Mountain View, CA, United States.
Kawale, Jaya ; Chatterjee, Singdhansu B ; Kumar, Arjun ; Liess, Stefan ; Steinbach, Michael S ; Kumar, Vipin. / Anomaly construction in climate data : Issues and challenges. Paper presented at NASA Conference on Intelligent Data Understanding, CIDU 2011, Mountain View, CA, United States.15 p.
@conference{b29183d800604d9db5e2298d51ed3f47,
title = "Anomaly construction in climate data: Issues and challenges",
abstract = "Earth science data consists of a strong seasonality component as indicated by the cycles of repeated patterns in climate variables such as air pressure, temperature and precipitation. The seasonality forms the strongest signals in this data and in order to find other patterns, the seasonality is removed by subtracting the monthly mean values of the raw data for each month. However since the raw data like air temperature, pressure, etc. are constantly being generated with the help of satellite observations, the climate scientists usually use a moving reference base interval of some years of raw data to calculate the mean in order to generate the anomaly time series and study the changes with respect to that. In this paper, we evaluate different measures for base computation and show how an arbitrary choice of base can skew the results and lead to a favorable outcome which might not necessarily be true. We perform a detailed study of different base selection criterion and base periods to highlight that the outcome of data mining can be sensitive to choice of the base. We present a case study of the dipole in the Sahel region to highlight the bias creeping into the results due to the choice of the base. Finally, we propose a generalized model for base selection which uses Monte-Carlo based methods to minimize the expected variance in the anomaly time-series of the underlying datasets. Our research can be instructive for climate scientists and researchers in temporal domain to enable them to choose the right base which would not bias the outcome of the results.",
author = "Jaya Kawale and Chatterjee, {Singdhansu B} and Arjun Kumar and Stefan Liess and Steinbach, {Michael S} and Vipin Kumar",
year = "2011",
month = "12",
day = "1",
language = "English (US)",
pages = "189--203",
note = "NASA Conference on Intelligent Data Understanding, CIDU 2011 ; Conference date: 19-10-2011 Through 21-10-2011",

}

TY - CONF

T1 - Anomaly construction in climate data

T2 - Issues and challenges

AU - Kawale, Jaya

AU - Chatterjee, Singdhansu B

AU - Kumar, Arjun

AU - Liess, Stefan

AU - Steinbach, Michael S

AU - Kumar, Vipin

PY - 2011/12/1

Y1 - 2011/12/1

N2 - Earth science data consists of a strong seasonality component as indicated by the cycles of repeated patterns in climate variables such as air pressure, temperature and precipitation. The seasonality forms the strongest signals in this data and in order to find other patterns, the seasonality is removed by subtracting the monthly mean values of the raw data for each month. However since the raw data like air temperature, pressure, etc. are constantly being generated with the help of satellite observations, the climate scientists usually use a moving reference base interval of some years of raw data to calculate the mean in order to generate the anomaly time series and study the changes with respect to that. In this paper, we evaluate different measures for base computation and show how an arbitrary choice of base can skew the results and lead to a favorable outcome which might not necessarily be true. We perform a detailed study of different base selection criterion and base periods to highlight that the outcome of data mining can be sensitive to choice of the base. We present a case study of the dipole in the Sahel region to highlight the bias creeping into the results due to the choice of the base. Finally, we propose a generalized model for base selection which uses Monte-Carlo based methods to minimize the expected variance in the anomaly time-series of the underlying datasets. Our research can be instructive for climate scientists and researchers in temporal domain to enable them to choose the right base which would not bias the outcome of the results.

AB - Earth science data consists of a strong seasonality component as indicated by the cycles of repeated patterns in climate variables such as air pressure, temperature and precipitation. The seasonality forms the strongest signals in this data and in order to find other patterns, the seasonality is removed by subtracting the monthly mean values of the raw data for each month. However since the raw data like air temperature, pressure, etc. are constantly being generated with the help of satellite observations, the climate scientists usually use a moving reference base interval of some years of raw data to calculate the mean in order to generate the anomaly time series and study the changes with respect to that. In this paper, we evaluate different measures for base computation and show how an arbitrary choice of base can skew the results and lead to a favorable outcome which might not necessarily be true. We perform a detailed study of different base selection criterion and base periods to highlight that the outcome of data mining can be sensitive to choice of the base. We present a case study of the dipole in the Sahel region to highlight the bias creeping into the results due to the choice of the base. Finally, we propose a generalized model for base selection which uses Monte-Carlo based methods to minimize the expected variance in the anomaly time-series of the underlying datasets. Our research can be instructive for climate scientists and researchers in temporal domain to enable them to choose the right base which would not bias the outcome of the results.

UR - http://www.scopus.com/inward/record.url?scp=84879411629&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84879411629&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:84879411629

SP - 189

EP - 203

ER -