By Monica Heger
This story, originally published June 3, has been updated with the latest analysis and assembly results from BGI.
Two independent teams have sequenced the genome of the Escherichia coli strain that has been wreaking havoc across Europe, resulting in the deaths of 17 people.
Two teams — one from BGI and the other a collaboration between the University of Münster and Life Technologies — used the Ion Torrent PGM to do the sequencing. Both groups found that the strain is a hybrid that contains antibiotic resistance genes, which may help explain why it has been especially virulent.
According to Dag Harmsen, director of research at the University of Münster, the quick turnaround time of the Ion Torrent instrument made it possible to obtain results within three days. "The biggest advantage [of the PGM] from my point of view as a public health official is that it's speedy, and speed is what is needed at the moment," he told In Sequence.
Researchers from the University and Life Tech's German headquarters collaborated to do the sequencing. The company received the samples on Monday, began sequencing that evening, and began analyzing the data on Wednesday, said Simone Guenther, who led the sequencing effort at Life Tech.
They sequenced the strain to about 28-fold coverage obtaining 100 base pair reads, using a combination of the 314 and 316 chips. The team is continuing to analyze the data and has completed a draft assembly, which is available via the GenBank database.
Additionally, the LifeTech team is now sequencing the strain on the SOLiD, and the University of Münster team is sequencing it on Illumina's HiScan SQ in order to generate a reference genome for the strain. Harmsen added that sequencing the strain on the Pacific Biosciences machine would have been "interesting," but that the machine was not available to them. PacBio researchers in December sequenced the Haitian cholera strain to pinpoint the origin of that disease outbreak (IS 12/14/2010).
Independently, researchers at BGI have also sequenced the E. coli strain on the PGM. Junjie Qin, who is leading the sequencing efforts at BGI, said that the team there is also sequencing the strain on multiple platforms, including the Illumina HiSeq, but that the PGM "takes the shortest time to generate genomic data." He did not disclose which other platforms were used.
"For combating such a deadly pathogen, time is the top thing to be considered," he explained via e-mail.
The BGI team has now published a draft assembly of the genome, based on sequence data from the Illumina HiSeq, which is available on its website.BGI is also providing the PCR primer sequences it used to develop diagnostic kits. The latest draft is a de novo assembly, compared to a previous assembly by BGI as well as the assembly by the LifeTech team, which are both based on aligning the sequences to a reference genome.
Additionally, a BGI spokesperson said that the team would also use Roche's 454 GS FLX to make a more complete assembly.
The two teams found very similar results. The strain appears to be a hybrid of two E. coli strains — enteroaggregative E. coli and enterohemorrhagic E. coli — which may help explain why it has been particularly pathogenic. The researchers also found genes in the bacteria that confer antibiotic resistance.
The German team is now sequencing a historic isolate of E. coli dating back more than 10 years that is the same serotype as the identified strain. Sequencing the older strain could help the team "see the changes that have occurred within 10 years and hopefully infer what has made this new one so pathogenic," Harmsed said. Sequencing of this strain is also being done on the PGM.
Justin Johnson, director of bioinformatics at EdgeBio, assembled and analyzed the raw reads made publicly available by BGI using CLC Bio's software. He said the quality of data produced by the Ion Torrent was comparable to other short-read platforms, but its quick run time set it apart.
Johnson said his analysis took just a couple of hours, although he added that the goal was not to produce the best assembly possible, but to just do a quick, preliminary analysis.
The N50s were smaller than what would have been produced from Roche's 454 GS FLX or the PacBio RS, but they are of "long-enough size to do genic content exploration," Johnson said
Like the BGI and German teams, he identified pathogenic genes and genes that confer resistance to antibiotics.
Johnson also found the size of the genome to be 5.2 megabases, which is slightly larger than the typical E. coli genome, usually 4.6 megabases — a finding also confirmed by BGI and LifeTech.
Clinical Implications
Being able to identify a bacterial strain causing an outbreak in real time could have clinical implications.
"We have this large ongoing outbreak in Germany and we'd like to solve the problem as fast as possible," said Alexander Mellmann, scientist at the German National Consulting Laboratory for Hemolytic Uremic Syndrome at the Institute of Hygiene, University Hospital Münster.
"Speed is very important in infection control," he added. Clinicians need answers not in four weeks, but in 48 hours, so it can be "used immediately in patient care."
Having the genome of this particular strain should have some impact on the outbreak in Europe by enabling researchers to design better tests, he said.
The genome could also yield insight into how the pathogen spreads and infects patients, which could be important in stemming the outbreak.
As for therapeutics, Mellmann said it was too early to know whether the genome sequence would impact treatment. "It's difficult to say whether we would find a new therapeutic target," Mellmann said. Typically, for this type of strain, just the symptoms are treated, he added.
Harmsen added that the sequence could also help trace the source of the outbreak, which has still not been identified.
Have topics you'd like to see covered by In Sequence? Contact the editor at mheger [at] genomeweb [.] com.