Server-side workflow execution using data grid technology for reproducible analyses of data-intensive hydrologic systems

Bakinam T. Essawy, Jonathan L. Goodall, Hao Xu, Arcot Rajasekar, James D. Myers, Tracy A Kugler, Mirza M. Billah, Mary C. Whitton, Reagan W. Moore

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Many geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing and postprocessing routines can be challenging for a number of reasons including (1) accessing and preprocessing the large volume and variety of data required by the model, (2) postprocessing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object functionality of the Integrated Rule-Oriented Data System and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community-driven cyberinfrastructure systems. The approach is demonstrated for a study investigating the impact of drought on populations in the Carolinas region of the United States. The analysis leverages computational modeling along with data from the Terra Populus project and data management and publication services provided by the Sustainable Environment-Actionable Data project. The work is part of a larger effort under the DataNet Federation Consortium project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.

Original languageEnglish (US)
Pages (from-to)163-175
Number of pages13
JournalEarth and Space Science
Volume3
Issue number4
DOIs
StatePublished - Jan 1 2016

Fingerprint

project management
data management
drought
software
modeling
project
services
analysis

Keywords

  • federation
  • hydrologic modeling
  • iRODS
  • reproducibility
  • workflows

Cite this

Server-side workflow execution using data grid technology for reproducible analyses of data-intensive hydrologic systems. / Essawy, Bakinam T.; Goodall, Jonathan L.; Xu, Hao; Rajasekar, Arcot; Myers, James D.; Kugler, Tracy A; Billah, Mirza M.; Whitton, Mary C.; Moore, Reagan W.

In: Earth and Space Science, Vol. 3, No. 4, 01.01.2016, p. 163-175.

Research output: Contribution to journalArticle

Essawy, BT, Goodall, JL, Xu, H, Rajasekar, A, Myers, JD, Kugler, TA, Billah, MM, Whitton, MC & Moore, RW 2016, 'Server-side workflow execution using data grid technology for reproducible analyses of data-intensive hydrologic systems', Earth and Space Science, vol. 3, no. 4, pp. 163-175. https://doi.org/10.1002/2015EA000139
Essawy, Bakinam T. ; Goodall, Jonathan L. ; Xu, Hao ; Rajasekar, Arcot ; Myers, James D. ; Kugler, Tracy A ; Billah, Mirza M. ; Whitton, Mary C. ; Moore, Reagan W. / Server-side workflow execution using data grid technology for reproducible analyses of data-intensive hydrologic systems. In: Earth and Space Science. 2016 ; Vol. 3, No. 4. pp. 163-175.
@article{b500afc7b59341d98d61116490be16dc,
title = "Server-side workflow execution using data grid technology for reproducible analyses of data-intensive hydrologic systems",
abstract = "Many geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing and postprocessing routines can be challenging for a number of reasons including (1) accessing and preprocessing the large volume and variety of data required by the model, (2) postprocessing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object functionality of the Integrated Rule-Oriented Data System and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community-driven cyberinfrastructure systems. The approach is demonstrated for a study investigating the impact of drought on populations in the Carolinas region of the United States. The analysis leverages computational modeling along with data from the Terra Populus project and data management and publication services provided by the Sustainable Environment-Actionable Data project. The work is part of a larger effort under the DataNet Federation Consortium project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.",
keywords = "federation, hydrologic modeling, iRODS, reproducibility, workflows",
author = "Essawy, {Bakinam T.} and Goodall, {Jonathan L.} and Hao Xu and Arcot Rajasekar and Myers, {James D.} and Kugler, {Tracy A} and Billah, {Mirza M.} and Whitton, {Mary C.} and Moore, {Reagan W.}",
year = "2016",
month = "1",
day = "1",
doi = "10.1002/2015EA000139",
language = "English (US)",
volume = "3",
pages = "163--175",
journal = "Earth and Space Science",
issn = "2333-5084",
publisher = "Wiley-Blackwell Publishing Ltd",
number = "4",

}

TY - JOUR

T1 - Server-side workflow execution using data grid technology for reproducible analyses of data-intensive hydrologic systems

AU - Essawy, Bakinam T.

AU - Goodall, Jonathan L.

AU - Xu, Hao

AU - Rajasekar, Arcot

AU - Myers, James D.

AU - Kugler, Tracy A

AU - Billah, Mirza M.

AU - Whitton, Mary C.

AU - Moore, Reagan W.

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Many geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing and postprocessing routines can be challenging for a number of reasons including (1) accessing and preprocessing the large volume and variety of data required by the model, (2) postprocessing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object functionality of the Integrated Rule-Oriented Data System and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community-driven cyberinfrastructure systems. The approach is demonstrated for a study investigating the impact of drought on populations in the Carolinas region of the United States. The analysis leverages computational modeling along with data from the Terra Populus project and data management and publication services provided by the Sustainable Environment-Actionable Data project. The work is part of a larger effort under the DataNet Federation Consortium project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.

AB - Many geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing and postprocessing routines can be challenging for a number of reasons including (1) accessing and preprocessing the large volume and variety of data required by the model, (2) postprocessing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object functionality of the Integrated Rule-Oriented Data System and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community-driven cyberinfrastructure systems. The approach is demonstrated for a study investigating the impact of drought on populations in the Carolinas region of the United States. The analysis leverages computational modeling along with data from the Terra Populus project and data management and publication services provided by the Sustainable Environment-Actionable Data project. The work is part of a larger effort under the DataNet Federation Consortium project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.

KW - federation

KW - hydrologic modeling

KW - iRODS

KW - reproducibility

KW - workflows

UR - http://www.scopus.com/inward/record.url?scp=85015426803&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015426803&partnerID=8YFLogxK

U2 - 10.1002/2015EA000139

DO - 10.1002/2015EA000139

M3 - Article

AN - SCOPUS:85015426803

VL - 3

SP - 163

EP - 175

JO - Earth and Space Science

JF - Earth and Space Science

SN - 2333-5084

IS - 4

ER -