Recompleting the Caenorhabditis elegans genome

Jun Yoshimura, Kazuki Ichikawa, Massa J. Shoura, Karen L. Artiles, Idan Gabdank, Lamia Wahba, Cheryl L. Smith, Mark L. Edgley, Ann E. Rougvie, Andrew Z. Fire, Shinichi Morishita, Erich M. Schwarz

Research output: Contribution to journalArticlepeer-review

72 Scopus citations

Abstract

Caenorhabditis elegans was the first multicellular eukaryotic genome sequenced to apparent completion. Although this assembly employed a standard C. elegans strain (N2), it used sequence data from several laboratories, with DNA propagated in bacteria and yeast. Thus, the N2 assembly has many differences from any C. elegans available today. To provide a more accurate C. elegans genome, we performed long-read assembly of VC2010, a modern strain derived from N2. Our VC2010 assembly has 99.98% identity to N2 but with an additional 1.8 Mb including tandem repeat expansions and genome duplications. For 116 structural discrepancies between N2 and VC2010, 97 structures matching VC2010 (84%) were also found in two outgroup strains, implying deficiencies in N2. Over 98% of N2 genes encoded unchanged products in VC2010; moreover, we predicted ≥53 new genes in VC2010. The recompleted genome of C. elegans should be a valuable resource for genetics, genomics, and systems biology.

Original languageEnglish (US)
Pages (from-to)1009-1022
Number of pages14
JournalGenome research
Volume29
Issue number6
DOIs
StatePublished - 2019

Bibliographical note

Publisher Copyright:
© 2019 Yoshimura et al. This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

Fingerprint

Dive into the research topics of 'Recompleting the Caenorhabditis elegans genome'. Together they form a unique fingerprint.

Cite this