Accurate prediction of the phenotypic outcomes produced by different combinations of genotypes, environments, and management interventions remains a key goal in biology with direct applications to agriculture, research, and conservation. The past decades have seen an expansion of new methods applied toward this goal. Here we predict maize yield using deep neural networks, compare the efficacy of 2 model development methods, and contextualize model performance using conventional linear and machine learning models. We examine the usefulness of incorporating interactions between disparate data types. We find deep learning and best linear unbiased predictor (BLUP) models with interactions had the best overall performance. BLUP models achieved the lowest average error, but deep learning models performed more consistently with similar average error. Optimizing deep neural network submodules for each data type improved model performance relative to optimizing the whole model for all data types at once. Examining the effect of interactions in the best-performing model revealed that including interactions altered the model’s sensitivity to weather and management features, including a reduction of the importance scores for timepoints expected to have a limited physiological basis for influencing yield—those at the extreme end of the season, nearly 200 days post planting. Based on these results, deep learning provides a promising avenue for the phenotypic prediction of complex traits in complex environments and a potential mechanism to better understand the influence of environmental and genetic factors.
Bibliographical noteFunding Information:
This research used resources provided by the SCINet project of the USDA Agricultural Research Service, ARS project number 0500-00093-001-00-D. In addition to the contributions listed for the authors we would like to acknowledge those presently and historically involved in the G2F initiative, especially the following: Tim Bessinger, Martin Bohn, Edward Buckler, Natalia DeLeon, Jode Edwards, Sherry Flint-Garcia, Candice Hirsch, James Holland, Beth Hood, David Hooker, Shawn Kaeppler, Joseph Knoll, Sanzchen Liu, John McKay, Richard Minyo, Seth Murray, Rebecca Nelson, James Schnable, Rajan Sekhon, Maninder Singh, Peter Thomison, Addie Thompson, Mitch Tuinstra, Jason Wallace, Randy Wisser, and Wenwei Xu, who co-ordinated data collection during 2018 and 2019. Joseph Gage and Cinta Romay produced genotypic data. Alejandro Castro Aviles, Jode Edwards, David Ertl, Joseph Gage, James Holland, Dayane Cristina Lima, Bridget A McFarland, Christina Poudyal, Anna Rogers, Cinta Romay, Luis Samayoa, Kevin Silverstein, Tyson Swetnam, and Jacob Washburn curated the 2018 data. Ryan Timothy Alpers, Alejandro Castro Aviles, James Holland, Dayane Cristina Lima, and Bridget A. McFarland curated the 2019 data. Jode Edwards distributed seeds for the experiments from 2014 to 2017. Tecle Weldekidan made additional contributions to the project. Natalia de Leon, Dayane Lima, and Cinta Romay worked with Joseph Gage in production of genomic data.
This project was funded by USDA Agricultural Research Service, ARS project number 5070-21000-041-000-D and enabled through computational resources funded through USDA Agricultural Research Service, ARS project number 0500-00093-001-00-D. The G2F initiative was also supported by funding from the Nebraska Corn Board (project ID #: 88-R-1617-03), Iowa Corn Promotion Board, Georgia Agricultural Commodity Commission for Corn, the Corn Marketing Program of Michigan, and National Corn Growers Association.
© 2023 Genetics Society of America. All rights reserved.
- convolutional neural network
- deep learning
- gene-by-environment interaction (G×E)
- phenotypic prediction
PubMed: MeSH publication types
- Journal Article
- Research Support, Non-U.S. Gov't
- Research Support, U.S. Gov't, Non-P.H.S.