Server-side workflow execution using data grid technology for reproducible analyses of data-intensive hydrologic systems

Bakinam T. Essawy, Jonathan L. Goodall, Hao Xu, Arcot Rajasekar, James D. Myers, Tracy A. Kugler, Mirza M. Billah, Mary C. Whitton, Reagan W. Moore

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Many geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing and postprocessing routines can be challenging for a number of reasons including (1) accessing and preprocessing the large volume and variety of data required by the model, (2) postprocessing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object functionality of the Integrated Rule-Oriented Data System and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community-driven cyberinfrastructure systems. The approach is demonstrated for a study investigating the impact of drought on populations in the Carolinas region of the United States. The analysis leverages computational modeling along with data from the Terra Populus project and data management and publication services provided by the Sustainable Environment-Actionable Data project. The work is part of a larger effort under the DataNet Federation Consortium project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.

Original languageEnglish (US)
Pages (from-to)163-175
Number of pages13
JournalEarth and Space Science
Volume3
Issue number4
DOIs
StatePublished - 2016

Bibliographical note

Funding Information:
Data grids are particularly useful for scientific communities such as hydrology that rely on multiple data and computational resource providers. The iRODS-powered Data Federation Consortium (DFC) grid, which is used for this research, was developed as part of a National Science Foundation (NSF) funded project and provides support for federation of both resources and services. The work reported here is part of the DFC project and uses a DFC data grid for storage and long-term access to data sets stored across heterogeneous resources. The core iRODS software is developed and maintained by the iRODS Consortium at the Renaissance Computing Institute (RENCI), which is a partnership between the University of North Carolina at Chapel Hill (UNC-CH) and the Data Intensive Cyber Environments Center at UNC-CH. iRODS currently runs in Linux/Unix environments.

Funding Information:
This work was supported by the National Science Foundation (NSF) under awards ACI-0940841, ACI-0940824, and ACI- 0940818 and by Amazon Web Services (AWS) through an Education Research Grant award. This research would not have been possible without assistance from the larger iRODS, DFC, SEAD, and TerraPop teams. The data used are listed in Table 1 and can be found in the SEAD repository at the DOIs provided in Table 1.

Keywords

  • federation
  • hydrologic modeling
  • iRODS
  • reproducibility
  • workflows

Fingerprint Dive into the research topics of 'Server-side workflow execution using data grid technology for reproducible analyses of data-intensive hydrologic systems'. Together they form a unique fingerprint.

Cite this