By Monica Heger
This article was originally published July 29.
The Escherichia coli outbreak in Europe, which has claimed upwards of 50 lives, has also been a testing ground for new sequencing platforms, spurring competition between groups to be the first in sequencing, assembling, and analyzing the genome.
While the first groups to sequence the genome immediately released their data to better aid in diagnostics, as the outbreak subsides, competition between the groups and the different sequencing platforms has accelerated and researchers are falling into several different camps regarding their interpretation of the data.
Two papers trying to elucidate the evolutionary origin of the outbreak strain were published this month, with differing theories.
One, published by the University of Münster and Life Technologies last month in PLoS One and based on sequencing by the Ion Torrent PGM, suggests that the outbreak strain and a closely related historic isolate from 2001 share an as-of-yet unidentified common ancestor, and that the current outbreak evolved to its present form mainly through gene loss.
Meanwhile, a separate study published last week in the New England Journal of Medicine by a team led by Pacific Biosciences, argues that the outbreak strain evolved through the acquisition of the Shiga toxin producing gene. Their analysis is based on the sequencing and assembly of the outbreak strain as well as 11 other isolates, seven of which are of the same serotype as the outbreak.
The outbreak strain was initially identified as a hybrid between enterohemorrhagic E. coli (EHEC) and enteroaggregative E. coli (EAEC). Sequencing by the UK's Health Protection Agency on the 454 GS Junior concluded that the strain was derived from the EAEC O104:H4 55989 strain and later acquired a phage genome with the capability of producing Shiga toxin.
A number of other groups have sequenced the outbreak genome on different platforms, making it a testing ground for some of the newer sequencing platforms like the PGM, Illumina's MiSeq, 454's GS Junior, and the Pacific Biosciences RS (IS 7/5/2011 and 7/12/2011).
Additionally, a group from the University of Göttingen published in June in the Archives of Microbiology the results of a sequencing study that used the 454 FLX, and BGI and collaborators who analyzed the outbreak sequence data via a "crowdsourcing" model published their analysis this week in NEJM.
The Münster and Life Tech team was among the first to sequence the outbreak strain, making the assembly data available in GenBank in early June (IS 6/7/2011). The PLoS One paper published last month details their initial work and subsequent analysis of the 2001 strain using the Ion Torrent PGM and Argus optical mapping technology from OpGen.
Based on the sequencing of the 2001 EHEC isolate, the Münster team believes that the outbreak strain is EHEC and that it and the EAEC strain both evolved from a progenitor of the 2001 EHEC outbreak strain.
Specifically, they propose that a hypothetical Shiga toxin-producing E. coli O104:H4 "with an EAEC genetic background" gave rise to the outbreak EHEC isolate, the 2001 EHEC isolate, and the EAEC strain. The pathogenic outbreak clone emerged as an EHEC/EAEC hybrid through "stepwise gain and loss of chromosomal and plasmid-encoded virulence factors," they wrote in the paper.
By having access to the historic EHEC isolate from 2001, "we could demonstrate that there must have existed a [Shiga-toxin producing] EHEC O104:H4 progenitor," Dag Harmsen, head of research at the University of Münster, told In Sequence in an e-mail. "Furthermore, we have good indications, but no proof, that this progenitor gave rise to the EAEC O104:H4" strain.
In the paper, Harmsen and colleagues argue that it's likely that the three strains evolved from a common Shiga toxin-producing ancestor, and EAEC lost its Shiga toxin-producing genes while both the outbreak EHEC strain and the 2001 EHEC strains retained them. While acquisition of such genes is not unknown, they note that "loss of several genes and genomic islands is more likely and occurs frequently."
By contrast, the PacBio team believes that the outbreak strain is EAEC and acquired its Shiga toxin through gene acquisition. They sequenced and de novo assembled the outbreak strain, and sequenced 11 other isolates, seven of which had the same serotype as the outbreak.
[ pagebreak ]
Using three PacBio RS instruments in parallel, they obtained approximately 75-fold coverage for each of the isolates in about five hours per isolate. The mean read length was 2,067 bases.
PacBio's chief scientific officer Eric Schadt told In Sequence that the de novo assembly of the outbreak strain enabled the team to look more comprehensively at structural variations.
While the Münster group was "looking mainly at SNPs, Schadt said that PacBio and its colleagues "were able to incorporate larger structural changes."
Additionally, the data from the seven strains of the same serotype "helped pinpoint that this strain was an EAEC, not an EHEC, but that it picked up the Shiga-toxin-producing region from the EHEC strain," he added.
In addition to comparing the outbreak strain to the 11 other strains, the PacBio team used data from 53 E. coli and shigella genomes to generate a phylogenetic tree outlining the evolution of the outbreak strain. They found that while the EAEC strains in general were extremely divergent, the O104:H4 serotypes were very similar, and that the outbreak strain fell into that cluster.
The similarity between the Shiga-toxin-encoding strains from the German outbreak and the EAEC O104:H4 strains lacking the Shiga-toxin-encoding phage signals that the incorporation of the phage into the EAEC genome was a "relatively recent event," the authors wrote.
Furthermore, the fact that the outbreak strain lies within the EAEC O104:H4 clade "confirms that the outbreak strain is not a prototypical enterohemorrhagic E. coli strain that has acquired the virulence features of enteroaggregative E. coli," they wrote.
David Rasko, assistant professor of microbiology at the University of Maryland's Institute for Genome Sciences and a co-author on the paper, said that comparing the outbreak strain to the other O104:H4 strains showed that the "cluster is very tight, whether they have the toxin or don't have the toxin."
Additionally, Rasko said, the "Shiga toxin is encoded on a mobile element, and that mobile element is known to move, and it inserts in a common insertion spot in the E. coli genome, and we found that common insertion spot [in the outbreak strain]."
As a result, "the most likely path of evolution is that it is an EAEC that has acquired the Shiga toxin," he said.
Rasko said that PacBio's long read lengths were particularly helpful in locating the Shiga toxin in the outbreak genome. "Mobile elements and phages are especially difficult to place," he said, so having longer read lengths was especially useful. "We could span the junctions where the Shiga toxin inserted because of the long reads," he added.
More E. coli Strains
As of now, neither group can prove that its theory is the correct one, although each believes that its respective evolutionary model is the most likely scenario. At this point, there's "no way to accurately delineate" that with 100 percent certainty, said Rasko.
Peter Gerner-Smidt, chief of the Enteric Diseases Laboratory Branch at the Centers for Disease Control, told In Sequence that more strains would need to be analyzed — and that the strains sequenced by the two groups should be analyzed together — to give a more comprehensive picture.
"I don’t think that we at the moment can give you a clear-cut answer" about the origin of the outbreak, he wrote an in e-mail.
Going forward, he said good candidates for further sequencing would be all the historic O104 strains associated with bloody diarrhea and hemolytic uremic syndrome.
"We may also need to look for EAEC strains with a virulence profile [similar to] the German O104 strain except for [Shiga-toxin] production, and some more O104 EAEC strains irrespective of their virulence profile," he said.
While they may disagree on their interpretation of the data, the Münster and PacBio teams agree that next-gen sequencing proved indispensible in characterizing the outbreak pathogen.
Harmsen said that the sequencing and analysis of E. coli during the outbreak demonstrates a new field —"prospective genomic epidemiology," in which the sequencing and analysis of the outbreak occurred in real time. This could impact surveillance, diagnostics, and therapeutics not only for this outbreak, but future outbreaks as well, he added.
Likewise, the PacBio team noted in the NEJM paper that "the worldwide efforts to sequence and analyze the genome of the German enteroaggregative E. coli outbreak strain illustrate the power of emerging high-throughput DNA-sequencing technologies," adding that "the rapid sequencing of isolates from this outbreak (and from related strains) has yielded critical insight into its causative agent."
Have topics you'd like to see covered by In Sequence? Contact the editor at mheger [at] genomeweb [.] com.