The rapidly growing number of grassroots ecological research networks demonstrates that ecologists have embraced distributed data collection and experimentation as a new tool for addressing global questions. A clear advantage of these networks is the ability to gather data at larger spatial and temporal scales and at relatively lower cost than could be typically accomplished by a single research team. However, a challenge arising from this structure is the need to merge distributed datasets into a coherent whole. The Nutrient Network, a coordinated distributed experiment entering its tenth year of data collection, has records from over 90 sites worldwide to date. In this paper I present lessons learned about data management from this project, focusing on such issues as standardization, storage, updates, and distribution of data within the network. I provide a relational database schema and associated workflow that could be generalized to many distributed ecological experiments or networked data observatories, especially those with need for taxonomic reconciliation of species occurrences. The success of distributed data collection efforts, especially long-term networks, will be proportional to the ability to coordinate and effectively combine project datasets.
Bibliographical noteFunding Information:
I thank Elizabeth Borer and Eric Seabloom for ongoing discussion and feedback on these processes and this manuscript. Habacuc Flores-Moreno provided insightful comments on an earlier draft. This work was generated using data from the Nutrient Network ( http://www.nutnet.org ) experiment, funded at the site-scale by individual researchers. Coordination and data management have been supported by funding to E. Borer and E. Seabloom from the National Science Foundation Research Coordination Network ( NSF-DEB-1042132 ) and Long Term Ecological Research ( NSF-DEB-1234162 to Cedar Creek LTER) programs, and the Institute on the Environment ( DG-0001-13 ). I also thank the Minnesota Supercomputer Institute for hosting project data and the Institute on the Environment for hosting Network meetings.
- Data management
- Database schema
- Distributed experiment
- Nutrient network
- Taxonomic resolution