TY - JOUR
T1 - Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths
AU - Touchon, Marie
AU - Hoede, Claire
AU - Tenaillon, Olivier
AU - Barbe, Valérie
AU - Baeriswyl, Simon
AU - Bidet, Philippe
AU - Bingen, Edouard
AU - Bonacorsi, Stéphane
AU - Bouchier, Christiane
AU - Bouvet, Odile
AU - Calteau, Alexandra
AU - Chiapello, Hélène
AU - Clermont, Olivier
AU - Cruveiller, Stéphane
AU - Danchin, Antoine
AU - Diard, Médéric
AU - Dossat, Carole
AU - El Karoui, Meriem
AU - Frapy, Eric
AU - Garry, Louis
AU - Ghigo, Jean Marc
AU - Gilles, Anne Marie
AU - Johnson, James
AU - Le Bouguénec, Chantal
AU - Lescat, Mathilde
AU - Mangenot, Sophie
AU - Martinez-Jéhanne, Vanessa
AU - Matic, Ivan
AU - Nassif, Xavier
AU - Oztas, Sophie
AU - Petit, Marie Agnès
AU - Pichon, Christophe
AU - Rouy, Zoé
AU - Ruf, Claude Saint
AU - Schneider, Dominique
AU - Tourret, Jérôme
AU - Vacherie, Benoit
AU - Vallenet, David
AU - Médigue, Claudine
AU - Rocha, Eduardo P.C.
AU - Denamur, Erick
PY - 2009/1
Y1 - 2009/1
N2 - The Escherichia coli species represents one of the best-studied model organisms, but also encompasses a variety of commensal and pathogenic strains that diversify by high rates of genetic change. We uniformly (re-) annotated the genomes of 20 commensal and pathogenic E. coli strains and one strain of E. fergusonii (the closest E. coli related species), including seven that we sequenced to completion. Within the ∼18,000 families of orthologous genes, we found ∼2,000 common to all strains. Although recombination rates are much higher than mutation rates, we show, both theoretically and using phylogenetic inference, that this does not obscure the phylogenetic signal, which places the B2 phylogenetic group and one group D strain at the basal position. Based on this phylogeny, we inferred past evolutionary events of gain and loss of genes, identifying functional classes under opposite selection pressures. We found an important adaptive role for metabolism diversification within group B2 and Shigella strains, but identified few or no extraintestinal virulence-specific genes, which could render difficult the development of a vaccine against extraintestinal infections. Genome flux in E. coli is confined to a small number of conserved positions in the chromosome, which most often are not associated with integrases or tRNA genes. Core genes flanking some of these regions show higher rates of recombination, suggesting that a gene, once acquired by a strain, spreads within the species by homologous recombination at the flanking genes. Finally, the genome's long-scale structure of recombination indicates lower recombination rates, but not higher mutation rates, at the terminus of replication. The ensuing effect of background selection and biased gene conversion may thus explain why this region is A+T-rich and shows high sequence divergence but low sequence polymorphism. Overall, despite a very high gene flow, genes co-exist in an organised genome.
AB - The Escherichia coli species represents one of the best-studied model organisms, but also encompasses a variety of commensal and pathogenic strains that diversify by high rates of genetic change. We uniformly (re-) annotated the genomes of 20 commensal and pathogenic E. coli strains and one strain of E. fergusonii (the closest E. coli related species), including seven that we sequenced to completion. Within the ∼18,000 families of orthologous genes, we found ∼2,000 common to all strains. Although recombination rates are much higher than mutation rates, we show, both theoretically and using phylogenetic inference, that this does not obscure the phylogenetic signal, which places the B2 phylogenetic group and one group D strain at the basal position. Based on this phylogeny, we inferred past evolutionary events of gain and loss of genes, identifying functional classes under opposite selection pressures. We found an important adaptive role for metabolism diversification within group B2 and Shigella strains, but identified few or no extraintestinal virulence-specific genes, which could render difficult the development of a vaccine against extraintestinal infections. Genome flux in E. coli is confined to a small number of conserved positions in the chromosome, which most often are not associated with integrases or tRNA genes. Core genes flanking some of these regions show higher rates of recombination, suggesting that a gene, once acquired by a strain, spreads within the species by homologous recombination at the flanking genes. Finally, the genome's long-scale structure of recombination indicates lower recombination rates, but not higher mutation rates, at the terminus of replication. The ensuing effect of background selection and biased gene conversion may thus explain why this region is A+T-rich and shows high sequence divergence but low sequence polymorphism. Overall, despite a very high gene flow, genes co-exist in an organised genome.
UR - http://www.scopus.com/inward/record.url?scp=59249089471&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=59249089471&partnerID=8YFLogxK
U2 - 10.1371/journal.pgen.1000344
DO - 10.1371/journal.pgen.1000344
M3 - Article
C2 - 19165319
AN - SCOPUS:59249089471
SN - 1553-7390
VL - 5
JO - PLoS genetics
JF - PLoS genetics
IS - 1
M1 - e1000344
ER -