Exploration of sample size and diatom-based indicator performance in three North American phosphorus training sets

Euan D. Reavie, Steve Juggins

Research output: Contribution to journalArticlepeer-review

27 Scopus citations


Three large training sets were investigated to determine optimal sample sizes for diatom-based inference models. The sample sets represented (1) assemblages from Great Lakes coastlines, (2) phytoplankton from the pelagic Great Lakes and (3) surface sediment assemblages from Minnesota lakes. Diatom-based weighted average models to infer nutrient concentrations were developed for each training set. Training set sample sizes ranging from 10 to the maximum number of samples were created through random sample selection, and performance of each model was evaluated. For each model iteration, diatom-inferred (DI) nutrient data were related to stressor data (e. g., adjacent agricultural or urban development) to characterize the ability of each model to track human activities. The relationships between model performance parameters (DI-stressor correlations and model r2, error and bias) and sample size were used to determine the minimum sample size needed to optimize models for each region. Depending on the training set, at least 40-70 samples were needed to capture the variation in diatom assemblages and environmental conditions to such a degree that non-analog situations should be rare and so should provide an unambiguous result if the model was applied to any sample assemblage from the region. It is recommended that one exercises caution when dealing with smaller training sets unless there is certainty that the selected samples reflect the regional variability in diatom assemblages and environmental conditions.

Original languageEnglish (US)
Pages (from-to)529-538
Number of pages10
JournalAquatic Ecology
Issue number4
StatePublished - Nov 2011

Bibliographical note

Funding Information:
Acknowledgments The Minnesota lake dataset has been progressively developed by Steve Heiskary and Mark Tomasek (Minnesota Pollution Control Agency), Dan Engstrom, Mark Edlund, Shawn Schottler and Joy Ramstack (St. Croix Watershed Research Station). Amy Kireta, Gerald Sgro, Norman Andresen and Michael Ferguson supported diatom assessments for GLEI samples. Michael Agbeti supported diatom assessments of the GLNPO phytoplankton samples. There are several people to thank for GLEI project management and field support, including Valerie Brady, Jerry Henneck, John Ameel, Gerald Niemi, John (Jack) Kelly, Russell Kreis and Jeffrey Johansen. This research was supported by grants to E. Reavie from the US Environmental Protection Agency under Cooperative Agreements EPA/R–8286750 (GLEI) and GL-00E23101 (GLNPO). This document has not been subjected to the EPA’s required peer and policy review and therefore does not necessarily reflect the view of the Agency, and no official endorsement should be inferred. This is contribution number 530 of the Center for Water and the Environment, Natural Resources Research Institute, University of Minnesota Duluth.


  • Diatoms
  • Inference models
  • Models
  • Sample size
  • Stressors
  • Training sets


Dive into the research topics of 'Exploration of sample size and diatom-based indicator performance in three North American phosphorus training sets'. Together they form a unique fingerprint.

Cite this