By Monica Heger
This article has been updated from a version posted July 1 to clarify that the 454 assembly was the first to determine that the outbreak strain had acquired a phage genome that produces the Shiga toxin.
The Escherichia coli outbreak in Europe has not only served as a proof of concept for the potential role of next-gen sequencing in public health but is also providing an opportunity to compare competing platforms in this rapidly evolving market.
While the outbreak has prompted a host of sequencing efforts, with nearly every next-gen sequencing platform being employed to sequence and assemble the strain, the spotlight has been on the newest entries to the marketplace: desktop systems like Life Technologies' Ion Torrent PGM and Illumina's MiSeq.
Two teams — one from Life Technologies and another from BGI — were the first to sequence the outbreak strain on the PGM. Further sequencing and assembly on Roche's 454 GS Junior confirmed the strain as enteroaggregative E. coli with an acquired phage genome that produces the Shiga toxin (IS 6/7/2011). Since then, however, Illumina has also sequenced the genome on its MiSeq desktop platform, which is not scheduled to launch until later this year. The MiSeq E. coli data is posted on the UK's Health Protection Agency's website, giving researchers a first look at the data from the instrument.
Additionally, Illumina offered its take on the two desktop platforms in a presentation on its website, in which it highlighted the supposed benefits of MiSeq. It compared the publicly available Ion Torrent E. coli data to data it generated internally on a laboratory strain of the bacteria. Even though Illumina sequenced the outbreak strain for the UK's Health Protection Agency, Geoff Smith, senior director of DNA sequencing at the company, said that there were restrictions on how that data could be used.
In its online presentation, Illumina emphasized the higher throughput and accuracy of its system compared to Ion Torrent's 314 chip. Life Technologies, meantime, contested some of Illumina's claims, touting the Ion Torrent platform as being a "truly disruptive technology."
Throughput vs. Flexibility
In one run on the MiSeq, Illumina generated 1.7 gigabases of data with an average coverage of 393-fold. It compared this to three sets of Ion Torrent data, generated by EdgeBio, BGI, and Life Tech, where the teams completed six, seven, and eight runs respectively, to generate 24, 11, and 15 megabases of data, respectively.
Additionally, said Smith, in one run of the MiSeq, the company sequenced seven isolates of the outbreak strain, compared to the PGM, which needed multiple runs to sequence one isolate. Data for five outbreak strains are available through HPA's website.
Justin Johnson, bioinformatics director at EdgeBio, said that comparing the MiSeq to the PGM 314 chip was not the most apt comparison. First of all, he said, the company's 316 chips are generating about 7-fold more data than the 314, and subsequent versions will have even higher throughput.
Life Technologies plans to launch the 316 chip later this month (IS 6/28/2011).
Additionally, Johnson said, the two machines have different advantages. While the MiSeq does indeed have a greater throughput, the PGM offers flexibility in experimental design.
[ pagebreak ]
For instance, he said, on a 316 chip, it would be possible to run five libraries in parallel and generate 1.3 gigabases of data in 24 hours — very close to MiSeq's output in the same time frame, although with more hands-on time in the library preparation steps. Additionally, the lower throughput of a single run enables flexibility in terms of experimental design. For instance, aside from sequencing one E. coli strain, a researcher could add metagenomic data, or amplicons, he said.
These experiments could be done in parallel on the PGM, but would be more difficult to do all at once on the MiSeq, because it would require barcoding. According to Johnson, using barcodes for different types of experiments typically does not produce the best results because of the inherent biases in the barcodes.
On the other hand, the MiSeq can barcode and run multiple samples of the same type of experiment. Depending on how you look at the platform, it could be a plus or a minus, Johnson said. "MiSeq has an integrated library construction," but the Ion Torrent is "more flexible" in part due to the fact that it is "more manual."
Johnson added that Ion Torrent has been increasing its throughput with each upgrade, and the 318 chip, scheduled for launch in the fourth quarter, is expected to be able to generate one gigabase per run.
In its presentation, Illumina claims that it has a much higher raw read accuracy compared to the PGM, generating an average Q-score of 31 for its internally generated data, compared to the public data sets for the PGM, which averaged a Q-score of 19.
However, Mike Lelivelt, director of bioinformatics and software products at Ion Torrent, said that when the company tried to replicate Illumina's values, it was not able to generate quality scores as high as Illumina claimed. While Illumina said its read quality peaks at around Q38 between bases 15 and 19, Ion Torrent could not achieve a quality score above around 31, Lelivelt said.
Meantime, he added, the PGM is continuously improving its technology, even for the 314 chip. This week, the company released on its Torrent Dev community site a data set from sequencing a laboratory strain of E. coli with the 314 chip. At base 100, the data had a Q-score of around 17, but this represented an improvement over the chip's performance in January, when the base 100 Q-score hovered around 10. In both cases, the Q-score began at around 25 at base position zero.
Lelivelt said that the company is aiming to improve on the accuracy even more, and noted that it is focusing on reducing the amount by which accuracy drops throughout the read, which is "setting us up for longer read lengths," he said.
EdgeBio's Johnson evaluated the accuracy of the platforms by looking at the differences in assembly. First, he tried to normalize the two data sets from the outbreak strain. He used just the forward reads from the MiSeq (because the PGM does not have paired-end reads), and did a de novo assembly of the data at 35-fold coverage and a de novo assembly of the PGM 316 data at 35-fold coverage.
[ pagebreak ]
On the PGM, he did an assembly using Newbler, achieving an N50 of 50,000 base pairs and 173 contigs, the longest of which was 211,000 base pairs. On the MiSeq, he did an assembly using Velvet, and generated an N50 of around 95,000 base pairs and 117 contigs, the longest of which was 236,000 base pairs.
Johnson concluded that the differences in assembly quality were likely due to read quality. While the PGM is still a bit behind in terms of read quality, he said, the fact that the company has made such significant gains with each updated chip is promising.
Additionally, he said that even though the MiSeq itself is a new platform, the chemistry is not new. The PGM has only "been in the wild for three months or so," he said. "Every single one of the platforms when they first came out went through this phase." It's all part of "working out the kinks of a new platform."
Tried and True vs. New
While MiSeq may currently have an advantage in terms of accuracy, whether that advantage will persist as Ion Torrent's technology matures is another question entirely. Johnson said that the vast improvements between the 314 chip and 316 chip were very encouraging.
He also questioned whether the MiSeq would remain limited to read lengths of under 200 base pairs due to its chemistry, and whether, as a result, the MiSeq technology had already matured.
Depending on how you look at it, the fact that the PGM uses a completely different chemistry could be an advantage or a disadvantage. "I see a lot of potential in the Ion," he said. "What's intriguing is the rapid turnaround time."
If the company can make good on its promises of longer read lengths and can start generating throughput and quality comparable to the MiSeq, it could have the advantage, he said.
Additionally, because the PGM relies on a different chemistry, the platform can be used for "rapid validation" of other sequencing studies. For instance, Johnson said for large research studies using the HiSeq, because the PGM is a different chemistry, researchers could use the platform to validate their findings.
The MiSeq, on the other hand, uses the same chemistry as the HiSeq. This could be appealing for a number of reasons, but primarily because the technology is already proven. Researchers familiar with the HiSeq essentially already know what they are buying, and can also use the same reagents and protocols that they use for the HiSeq. However, because it is based on the same chemistry, the platform's potential could be limited to being just a "mini-HiSeq."
Johnson said it is still unclear which platform will perform better in the long run, and added that while comparing the two technologies now is worthwhile, the more interesting comparison will be in a year or two, after the platforms have been on the market, undergone additional upgrades, and generated more data in the production setting.
Have topics you'd like to see covered by In Sequence? Contact the editor at mheger [at] genomeweb [.] com.