Big data, data privacy, and plant and animal disease research using GEMS

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


One of the major challenges in ensuring global food security is the ever-changing biotic risk affecting the productivity and efficiency of the global food supply system. Biotic risks that threaten food security include pests and diseases that affect pre- and postharvest terrestrial agriculture and aquaculture. Strategies to minimize this risk depend heavily on plant and animal disease research. As data collected at high spatial and temporal resolutions become increasingly available, epidemiological models used to assess and predict biotic risks have become more accurate and, thus, more useful. However, with the advent of Big Data opportunities, a number of challenges have arisen that limit researchers’ access to complex, multi-sourced, multi-scaled data collected on pathogens, and their associated environments and hosts. Among these challenges, one of the most limiting factors is data privacy concerns from data owners and collectors. While solutions, such as the use of de-identifying and anonymizing tools that protect sensitive information are recognized as effective practices for use by plant and animal disease researchers, there are comparatively few platforms that include data privacy by design that are accessible to researchers. We describe how the general thinking and design used for data sharing and analysis platforms can intrinsically address a number of these data privacy-related challenges that are a barrier to researchers wanting to access data. We also describe how some of the data privacy concerns confronting plant and animal disease researchers are addressed by way of the GEMS informatics platform.

Original languageEnglish (US)
Pages (from-to)2644-2652
Number of pages9
JournalAgronomy Journal
Issue number5
StatePublished - Sep 1 2022

Bibliographical note

Funding Information:
Most fields that use predictive sciences such as disease‐modelling research are highly dependent on both the spatial and temporal accuracy of disease incidence and environmental data inputs. Input datasets used in such modelling frameworks use various data sources, which unfortunately have a high proportion of obsolete datasets (see Escribano et al. [ 2016 ] for a biodiversity data example). There are many publicly funded databases that receive financial support through one‐off grants that are invaluable in supporting pest and disease research. However, such databases run the risk of being obsolete because time‐dependent datasets are not updated in a timely manner due to lack of clear data ownership and funding. While some of these types of datasets are kept up to date by a user community effort, not all initiatives have committed communities to extend the utility lifespan of one‐off data products. Additionally, there is not a financial support system–data use communities are not organized and capable of generating the necessary funds to meet updating costs. Both the cost and the will to update datasets can be addressed if they are hosted on platforms that have access control for the data owner, which allows data owners to give different levels of access based on donation or fee categories.

Publisher Copyright:
© 2021 The Authors. Agronomy Journal © 2021 American Society of Agronomy.


Dive into the research topics of 'Big data, data privacy, and plant and animal disease research using GEMS'. Together they form a unique fingerprint.

Cite this