TY - JOUR
T1 - High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource
AU - Seaver, Samuel M.D.
AU - Gerdes, Svetlana
AU - Frelin, Océane
AU - Lerma-Ortiz, Claudia
AU - Bradbury, Louis M.T.
AU - Zallot, Rémi
AU - Hasnain, Ghulam
AU - Niehaus, Thomas D.
AU - El Yacoubi, Basma
AU - Pasternak, Shiran
AU - Olson, Robert
AU - Pusch, Gordon
AU - Overbeek, Ross
AU - Stevens, Rick
AU - De Crécy-Lagard, Valérie
AU - Ware, Doreen
AU - Hanson, Andrew D.
AU - Henry, Christopher S.
PY - 2014/7/1
Y1 - 2014/7/1
N2 - The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for crosskingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of modelbased assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.
AB - The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for crosskingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of modelbased assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.
KW - Computational biochemistry
KW - Plant genomics
KW - Plant metabolism
KW - Systems biology
UR - http://www.scopus.com/inward/record.url?scp=84903735161&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84903735161&partnerID=8YFLogxK
U2 - 10.1073/pnas.1401329111
DO - 10.1073/pnas.1401329111
M3 - Article
C2 - 24927599
AN - SCOPUS:84903735161
SN - 0027-8424
VL - 111
SP - 9645
EP - 9650
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 26
ER -