Predicting lake surface water phosphorus dynamics using process-guided machine learning

Paul C. Hanson, Aviah B. Stillman, Xiaowei Jia, Anuj Karpatne, Hilary A. Dugan, Cayelan C. Carey, Joseph Stachelek, Nicole K. Ward, Yu Zhang, Jordan S. Read, Vipin Kumar

Research output: Contribution to journalArticlepeer-review

17 Scopus citations


Phosphorus (P) loading to lakes is degrading the quality and usability of water globally. Accurate predictions of lake P dynamics are needed to understand whole-ecosystem P budgets, as well as the consequences of changing lake P concentrations for water quality. However, complex biophysical processes within lakes, along with limited observational data, challenge our capacity to reproduce short-term lake dynamics needed for water quality predictions, as well as long-term dynamics needed to understand broad scale controls over lake P. Here we use an emerging paradigm in modeling, process-guided machine learning (PGML), to produce a phosphorus budget for Lake Mendota (Wisconsin, USA) and to accurately predict epilimnetic phosphorus over a time range of days to decades. In our implementation of PGML, which we term a Process-Guided Recurrent Neural Network (PGRNN), we combine a process-based model for lake P with a recurrent neural network, and then constrain the predictions with ecological principles. We test independently the process-based model, the recurrent neural network, and the PGRNN to evaluate the overall approach. The process-based model accounted for most of the observed pattern in lake P; however it missed the long-term trend in lake P and had the worst performance in predicting winter and summer P in surface waters. The root mean square error (RMSE) for the process-based model, the recurrent neural network, and the PGRNN was 33.0 μg P L−1, 22.7 μg P L−1, and 20.7 μg P L−1, respectively. All models performed better during summer, with RMSE values for the three models (same order) equal to 14.3 μg P L−1, 10.9 μg P L−1, and 10.7 μg P L−1. Although the PGRNN had only marginally better RMSE during summer, it had lower bias and reproduced long-term decreases in lake P missed by the other two models. For all seasons and all years, the recurrent neural network had better predictions than process alone, with root mean square error (RMSE) of 23.8 μg P L−1 and 28.0 μg P L−1, respectively. The output of PGRNN indicated that new processes related to water temperature, thermal stratification, and long term changes in external loads are needed to improve the process model. By using ecological knowledge, as well as the information content of complex data, PGML shows promise as a technique for accurate prediction in messy, real-world ecological dynamics, while providing valuable information that can improve our understanding of process.

Original languageEnglish (US)
Article number109136
JournalEcological Modelling
StatePublished - Aug 15 2020

Bibliographical note

Funding Information:
We thank our CNH colleagues and our GLEON colleagues for valuable discussions of the ideas herein and Samantha Oliver for reviewing the manuscript. We are grateful for Y. Gil, who catalyzed the collaboration between ecologists and computer scientists. Two anonymous reviewers provided helpful criticisms. The NTL LTER (DEB-1440297) provided context and data for the study. Funding: The U.S. National Science Foundation provided funding through the CNH-Lakes project ( ICER -1517823), DEB-1753639 , OAC-1934633 and DEB-1753657 .

Publisher Copyright:
© 2020 The Authors


  • Lake
  • Lake Mendota
  • Long-term
  • Machine learning
  • Model
  • Phosphorus


Dive into the research topics of 'Predicting lake surface water phosphorus dynamics using process-guided machine learning'. Together they form a unique fingerprint.

Cite this