Abstract
Caenorhabditis elegans was the first multicellular eukaryotic genome sequenced to apparent completion. Although this assembly employed a standard C. elegans strain (N2), it used sequence data from several laboratories, with DNA propagated in bacteria and yeast. Thus, the N2 assembly has many differences from any C. elegans available today. To provide a more accurate C. elegans genome, we performed long-read assembly of VC2010, a modern strain derived from N2. Our VC2010 assembly has 99.98% identity to N2 but with an additional 1.8 Mb including tandem repeat expansions and genome duplications. For 116 structural discrepancies between N2 and VC2010, 97 structures matching VC2010 (84%) were also found in two outgroup strains, implying deficiencies in N2. Over 98% of N2 genes encoded unchanged products in VC2010; moreover, we predicted ≥53 new genes in VC2010. The recompleted genome of C. elegans should be a valuable resource for genetics, genomics, and systems biology.
Original language | English (US) |
---|---|
Pages (from-to) | 1009-1022 |
Number of pages | 14 |
Journal | Genome research |
Volume | 29 |
Issue number | 6 |
DOIs | |
State | Published - 2019 |
Bibliographical note
Funding Information:We thank Koichiro Doi, Tân Vu, and Kosuke Kudo for developing a tool for visualizing dot plots with alignments of reads and genes; Michael McGurk for discussions on noncoding repetitive genomic DNA; Robert Waterston and Erik Andersen for discussions on genetic diversity within N2; Michael Paulini from WormBase for managing genome assembly submission to ENA; Roberto Gomez for suggesting using the NCBI fork of ACeDB; and Titus Brown and the Michigan State University High-Performance Computing Center (supported by U.S. Department of Agriculture grant 2010-65205-20361 and NIFA–National Science Foundation (NSF) grant IOS-0923812) for computational support. Additional computing was enabled by a start-up allocation from NSF XSEDE (TG-MCB180039). This study was supported by the Japan Science and Technology Corporation (CREST, JPMJCR13W3) and AMED (Japan Agency for Medical Research and Development, GRIFIN) grants to S.M.; National Institutes of Health (NIH) grant GM37706/GM130366 to A.Z.F.; NIH grant AI111173, Moore Foundation Grant No. 4551, and Cornell University start-up funds to E.M.S.; NIH grant OD010440 to A.E.R.; an Arnold O. Beckman Postdoctoral Fellowship and a Stanford Medicine Dean’s Fellowship to M.J.S.; and a Helen Hay Whitney Postdoctoral Fellowship to L.W.
Funding Information:
We thank Koichiro Doi, Tân Vu, and Kosuke Kudo for developing a tool for visualizing dot plots with alignments of reads and genes; Michael McGurk for discussions on noncoding repetitive genomic DNA; Robert Waterston and Erik Andersen for discussions on genetic diversity within N2; Michael Paulini from WormBase for managing genome assembly submission to ENA; Roberto Gomez for suggesting using the NCBI fork of ACeDB; and Titus Brown and the Michigan State University High-Performance Computing Center (supported by U.S. Department of Agriculture grant 2010-65205-20361 and NIFA-National Science Foundation (NSF) grant IOS-0923812) for computational support. Additional computing was enabled by a start-up allocation from NSF XSEDE (TG-MCB180039). This study was supported by the Japan Science and Technology Corporation (CREST, JPMJCR13W3) and AMED (Japan Agency for Medical Research and Development, GRIFIN) grants to S.M.; National Institutes of Health (NIH) grant GM37706/GM130366 to A.Z.F.; NIH grant AI111173, Moore Foundation Grant No. 4551, and Cornell University start-up funds to E.M.S.; NIH grant OD010440 to A.E.R.; an Arnold O. Beckman Postdoctoral Fellowship and a Stanford Medicine Dean's Fellowship to M.J.S.; and a Helen Hay Whitney Postdoctoral Fellowship to L.W.
Publisher Copyright:
© 2019 Yoshimura et al. This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.