TY - GEN
T1 - SciFlow
T2 - 2013 IEEE International Conference on Big Data, Big Data 2013
AU - Xuan, Pengfei
AU - Zheng, Yueli
AU - Sarupria, Sapna
AU - Apon, Amy
PY - 2013
Y1 - 2013
N2 - Many computational science applications utilize complex workflow patterns that generate an intricately connected set of output files for subsequent analysis. Some types of applications, such as rare event sampling, additionally require guaranteed completion of all subtasks for analysis, and place significant demands on the workflow management and execution environment. SciFlow is a user interface built over the Hadoop infrastructure that provides a framework to support the complex process and data interactions and guaranteed completion requirements of scientific workflows. It provides an efficient mechanism for building a parallel scientific application with dataflow patterns, and enables the design, deployment, and execution of data intensive, many-task computing tasks on a Hadoop platform. The design principles of this framework emphasize simplicity, scalability and fault-tolerance. A case study using the forward flux sampling rare event simulation application validates the functionality, reliability and effectiveness of the framework.
AB - Many computational science applications utilize complex workflow patterns that generate an intricately connected set of output files for subsequent analysis. Some types of applications, such as rare event sampling, additionally require guaranteed completion of all subtasks for analysis, and place significant demands on the workflow management and execution environment. SciFlow is a user interface built over the Hadoop infrastructure that provides a framework to support the complex process and data interactions and guaranteed completion requirements of scientific workflows. It provides an efficient mechanism for building a parallel scientific application with dataflow patterns, and enables the design, deployment, and execution of data intensive, many-task computing tasks on a Hadoop platform. The design principles of this framework emphasize simplicity, scalability and fault-tolerance. A case study using the forward flux sampling rare event simulation application validates the functionality, reliability and effectiveness of the framework.
KW - Big Data
KW - Hadoop
KW - dataflow
KW - dataflow-driven design patterns
KW - forward flux sampling rare events simulation
KW - many-task computing
KW - scientific computing
UR - http://www.scopus.com/inward/record.url?scp=84893304995&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893304995&partnerID=8YFLogxK
U2 - 10.1109/BigData.2013.6691725
DO - 10.1109/BigData.2013.6691725
M3 - Conference contribution
AN - SCOPUS:84893304995
SN - 9781479912926
T3 - Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
SP - 36
EP - 44
BT - Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
PB - IEEE Computer Society
Y2 - 6 October 2013 through 9 October 2013
ER -