Assessing the performance of a diatom transfer function on four Minnesota lake sediment cores: Effects of training set size and sample age

Euan D. Reavie, Mark B. Edlund

Research output: Contribution to journalArticlepeer-review

10 Scopus citations


Paleolimnological information is often extracted from diatom records using weighted averaging calibration and regression techniques. Larger calibration sample sets yield better inferences because they better characterize the environmental characteristics and species assemblages of the sample region. To optimize inferred information from fossil assemblages, however, it is worth knowing if fewer calibration samples can be used. Furthermore, confidence in environmental reconstructions is greater if we consider the relative importance of (A) similarity between fossil and calibration assemblages and (B) how well fossil taxa respond to the environmental variable of interest. We examine these issues using ~200-year sediment profiles from four Minnesota lakes and a 145-lake surface sediment training set calibrated for total phosphorus (TP). Training set sample sizes ranging from 10 to 145 were created through random sample selection, and models based on these training sets were used to calculate diatom-inferred (DI) TP data from fossil samples. Relationships between DI-TP variability and sample size were used to determine the minimum sample size needed to optimize the model for paleo-reconstruction. Similarly, similarities between fossil and modern assemblages were calculated for each size training set. Finally, fossil and modern assemblages were compared to determine whether older fossil samples had poorer similarity with modern analogs. More than 50-80 samples, depending on lake, were needed to stabilize variability in DI-TP results, and >110 training set samples were needed to minimize modern-fossil assemblage dissimilarities. Dissimilarities appeared to increase with sample age, but only one of the four studied cores displayed a significant trend. We have two recommendations for future studies: (1) be cautious when dealing with smaller training sets, especially if they are used to interpret older fossil assemblages and (2) understand how well fossil taxa are attuned to the variable of interest, as it is critical to evaluating the quality of the diatom-inferred data.

Original languageEnglish (US)
Pages (from-to)87-104
Number of pages18
JournalJournal of Paleolimnology
Issue number1
StatePublished - Jun 2013

Bibliographical note

Funding Information:
Acknowledgments Jill Coleman-Wasik, Allison Stevens, Dan Engstrom, Shawn Schottler, Noel Griese and Rian Reed helped with fieldwork and laboratory analyses. The Minnesota lake dataset was progressively developed by John Kingston, Morgan Bursiel, Steve Heiskary and Mark Tomasek (Minnesota Pollution Control Agency), Dan Engstrom and Joy Ramstack (St. Croix Watershed Research Station). This is contribution number 543 of the Center for Water and the Environment, Natural Resources Research Institute, University of Minnesota Duluth. This project was supported in part by the National Science Foundation under grant DEB-0919095. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.


  • Analog analysis
  • Calibration sets
  • Diatoms
  • Inference models
  • Minnesota
  • Phosphorus
  • Sample size


Dive into the research topics of 'Assessing the performance of a diatom transfer function on four Minnesota lake sediment cores: Effects of training set size and sample age'. Together they form a unique fingerprint.

Cite this