Hadoop, employing the MapReduce programming paradigm, has been widely accepted as the standard framework for analyzing big data in distributed environments. Unfortunately, this rich framework has not been exploited for processing large-scale computational geometry operations. This paper introduces CG_Hadoop; a suite of scalable and efficient MapReduce algorithms for various fundamental computational geometry operations, namely polygon union, Voronoi diagram, skyline, convex hull, farthest pair, and closest pair, which present a set of key components for other geometric algorithms. For each computational geometry operation, CG_Hadoop has two versions, one for the Apache Hadoop system and one for the SpatialHadoop system, a Hadoop-based system that is more suited for spatial operations. These proposed algorithms form the nucleus of a comprehensive MapReduce library of computational geometry operations. Extensive experimental results run on a cluster of 25 machines over datasets of size up to 3.8B records show that CG_Hadoop achieves up to 14x and 115x better performance than traditional algorithms when using Hadoop and SpatialHadoop systems, respectively.
Bibliographical notePublisher Copyright:
© 2019, Springer-Verlag GmbH Germany, part of Springer Nature.
- Computational Geometry
- Distributed Systems
- Output-sensitive Algorithms