Big geospatial data is an emerging sub-area of geographic information science, big data, and cyberinfrastructure. Big geospatial data poses two unique challenges. First, raster and vector data structures and analyses have developed on largely separate paths for the last 20 years. This is creating an impediment to geospatial researchers seeking to utilize big data platforms that do not promote heterogeneous data types. Second, big spatial data repositories have yet to be integrated with big data computation platforms in ways that allow researchers to spatio-temporally analyze big geospatial datasets. IPUMS-Terra, a National Science Foundation cyberInfrastructure project, addresses these challenges by providing a unified framework of integrated geospatial services which access, analyze, and transform big heterogeneous spatio-temporal data. As IPUMS-Terra's data volume grows, we seek to integrate geospatial platforms that will scale geospatial analyses and address current bottlenecks within our system. However, our work shows that there are still unresolved challenges for big geospatial analysis. The most pertinent is that there is a lack of a unified framework for conducting scalable integrated vector and raster data analysis. We conducted a comparative analysis between PostgreSQL with PostGIS and SciDB and concluded that SciDB is the superior platform for scalable raster zonal analyses.
Bibliographical noteFunding Information:
This research was supported by the National Science Foundation award OCI-0940818 and with support the National Institute of Child Health and Human Development and National Institutes of Health supported Minnesota Population Center (R24 HD041023). We would also like to thank the members of IPUMS-Terra team who assisted with this research and provided guidance throughout this process. We would also like to thank our three anonymous reviewers who provided useful comments, which we were able to incorporate into this manuscript.