Within- And cross-species predictions of plant specialized metabolism genes using transfer learning

Bethany M. Moore, Peipei Wang, Pengxiang Fan, Aaron Lee, Bryan Leong, Yann Ru Lou, Craig A. Schenck, Koichi Sugimoto, Robert Last, Melissa D. Lehti-Shiu, Cornelius S. Barry, Shin Han Shiu

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Plant specialized metabolites mediate interactions between plants and the environment and have significant agronomical/pharmaceutical value. Most genes involved in specialized metabolism (SM) are unknown because of the large number of metabolites and the challenge in differentiating SM genes from general metabolism (GM) genes. Plant models like Arabidopsis thaliana have extensive, experimentally derived annotations, whereas many non-model species do not. Here we employed a machine learning strategy, transfer learning, where knowledge from A. thaliana is transferred to predict gene functions in cultivated tomato with fewer experimentally annotated genes. The first tomato SM/GM prediction model using only tomato data performs well (F-measure = 0.74, compared with 0.5 for random and 1.0 for perfect predictions), but from manually curating 88 SM/GM genes, we found many mis-predicted entries were likely mis-annotated. When the SM/GM prediction models built with A. thaliana data were used to filter out genes where the A. thaliana-based model predictions disagreed with tomato annotations, the new tomato model trained with filtered data improved significantly (F-measure = 0.92). Our study demonstrates that SM/GM genes can be better predicted by leveraging cross-species information. Additionally, our findings provide an example for transfer learning in genomics where knowledge can be transferred from an information-rich species to an information-poor one.

Original languageEnglish (US)
JournalIn Silico Plants
Volume2
Issue number1
DOIs
StatePublished - 2020
Externally publishedYes

Bibliographical note

Funding Information:
This work was supported by NSF grant IOS-1546617 to R.L., C.S.B. and S.-H.S.; National Institute of General Medical Sciences of the National Institutes of Health graduate training grant T32-GM110523 to B.L.; a postdoctoral fellowship from the National Science Foundation (NSF) IOS-1811055 to C.A.S.; U.S. Department of Energy Great Lakes Bioenergy Research Center (BER DESC0018409) grant to R.L. and S.-H.S.; Michigan AgBioResearch and U.S. Department of Agriculture National Institute of Food and Agriculture Hatch project number MICL02552 to C.S.B; and NSF grant DEB-1655386 to S.-H.S.

Publisher Copyright:
© 2020 The Author(s).

Keywords

  • Cross-species gene prediction
  • specialized metabolism
  • transfer learning

Fingerprint

Dive into the research topics of 'Within- And cross-species predictions of plant specialized metabolism genes using transfer learning'. Together they form a unique fingerprint.

Cite this