SciFlow: A dataflow-driven model architecture for scientific computing using Hadoop

Pengfei Xuan, Yueli Zheng, Sapna Sarupria, Amy Apon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Scopus citations

Abstract

Many computational science applications utilize complex workflow patterns that generate an intricately connected set of output files for subsequent analysis. Some types of applications, such as rare event sampling, additionally require guaranteed completion of all subtasks for analysis, and place significant demands on the workflow management and execution environment. SciFlow is a user interface built over the Hadoop infrastructure that provides a framework to support the complex process and data interactions and guaranteed completion requirements of scientific workflows. It provides an efficient mechanism for building a parallel scientific application with dataflow patterns, and enables the design, deployment, and execution of data intensive, many-task computing tasks on a Hadoop platform. The design principles of this framework emphasize simplicity, scalability and fault-tolerance. A case study using the forward flux sampling rare event simulation application validates the functionality, reliability and effectiveness of the framework.

Original languageEnglish (US)
Title of host publicationProceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
PublisherIEEE Computer Society
Pages36-44
Number of pages9
ISBN (Print)9781479912926
DOIs
StatePublished - 2013
Externally publishedYes
Event2013 IEEE International Conference on Big Data, Big Data 2013 - Santa Clara, CA, United States
Duration: Oct 6 2013Oct 9 2013

Publication series

NameProceedings - 2013 IEEE International Conference on Big Data, Big Data 2013

Other

Other2013 IEEE International Conference on Big Data, Big Data 2013
Country/TerritoryUnited States
CitySanta Clara, CA
Period10/6/1310/9/13

Keywords

  • Big Data
  • Hadoop
  • dataflow
  • dataflow-driven design patterns
  • forward flux sampling rare events simulation
  • many-task computing
  • scientific computing

Fingerprint

Dive into the research topics of 'SciFlow: A dataflow-driven model architecture for scientific computing using Hadoop'. Together they form a unique fingerprint.

Cite this