A recent study has found that the Illumina MiSeq instrument edges out the Life Technologies Ion Torrent PGM platform for de novo genome sequencing and assembly on a specific microbial species, though the PGM offered advantages for sequence typing the microbe.
As they reported in PLoS One, Australian researchers compared benchtop sequencers, focusing on platform performance as it relates to Helicobacter pylori genome sequencing. Using sequence data for a pair of H. pylori reference strains resequenced with PGM and MiSeq platforms and protocols, the group teased apart read error, genome assembly, and sequence typing profiles that are expected to be important when doing more routine sequencing of the genetically variable bacterial species, best known for causing stomach ulcers and related conditions.
"What we tried to achieve and publish is an understanding of the accuracy of de novo assemblies as a whole, for H. pylori," corresponding author Timothy Perkins, a pathology and laboratory medicine researcher at the University of Western Australia, told In Sequence in an email message.
"Not a lot is discussed in the literature about the overall accuracy of an assembly," he added, "nor is the level of confidence assigned to a particular reference sequence."
In the team's analysis, Illumina's MiSeq instrument — used together with the Nextera library preparation protocol — topped the heap for accuracy when doing de novo H. pylori genome sequencing and assembly. With that instrument and sample prep combo, coding sequences generally show 100 percent accuracy over 95 percent of the assembly, Perkins noted.
Even so, results of the head-to-head comparison indicated that there are chunks of the H. pylori genome missed by MiSeq sequencing, particularly when done in conjunction with the Nextera XT sample preparation method, a somewhat cheaper kit that requires less input DNA. Among them: sequences from an H. pylori housekeeping gene called aptA that's included in conventional multi-locus sequencing typing, or MLST, analyses.
"With the Ion Torrent PGM, we found an inherently high error rate in the raw sequence data," Perkins and his co-authors wrote. "Using the Illumina MiSeq, we found significantly more non-covered nucleotides when using the less expensive Illumina Nextera XT compared with the Illumina Nextera library creation method."
Those involved in the new study hope the work will spark increased interest in characterizing the accuracy of genome assemblies in general. For H. pylori specifically, the group is keen to see a move away from MLST-based H. pylori classifications and into more routine whole-genome sequencing-based classification of the bug.
"We would like to replace MLST typing with whole-genome sequencing and we have developed scripts to do this," Perkins said. "This is a cheaper method and more information is available for further analyses."
Two high-quality reference genomes, both generated by Sanger sequencing, have been available for H. pylori for more than a decade, Perkins and his co-authors noted. One represents a European strain called 26695 while the other was assembled with sequences from the US isolate J99.
But while existing technologies make it possible to routinely sequence new bacterial genomes in just a few days, the researchers had questions about which high-throughput sequencing methods might offer the most accurate representation of the variable H. pylori genome.
The bug is prone to high mutation rates, Perkins noted, perhaps contributing to the pronounced variation that exists between H. pylori organisms with different MLST profiles.
And while the guanine and cytosine nucleotides that cause problems for some sequencing platforms are in the minority within the H. pylori sequence at large, the genome does house a slew of homopolymeric runs — tracts of sequence in which the same base appears as many as 13 times in a row.
For their sequencer comparison, the researchers focused on two relatively low-cost platforms that they had on hand: the Ion Torrent PGM, a semiconductor sequencing method, and Illumina MiSeq, which uses sequencing-by-synthesis.
The researchers used the PGM platform plus Life Tech's Ion Xpress Fragment library kit to sequence H. pylori from the J99 and 26695 strains.
The same strains were subjected to paired-end sequencing with 150-base pair protocols using the MiSeq in combination with two different library preparation protocols, Nextera and Nextera XT.
The sequences were assembled, mapped, and analyzed using freely available software as well as algorithms recommended by each instrument manufacturer.
As part of their overall comparison, meanwhile, the investigators looked not only at the accuracy of the resulting de novo H. pylori assemblies, but also at the MLST profiles and clinically informative pathogenicity island sequences that could be gleaned from them.
The team's comparisons with the H. pylori reference genomes — as well a k-mer analysis that considered the number of 31-mer sequence combinations present at a given genome coverage depth — indicated that the error rate was higher for the PGM platform than it was for MiSeq.
Likewise, the accuracy of MiSeq Nextera reads exceeded that of MiSeq reads generated in combination with the Nextera XT sample preparation protocol (the Nextera XT protocol was linked to lower genome coverage than the Nextera method).
Data generated on all three platforms predicted similar numbers of single nucleotide variants in the H. pylori genomes, the group reported. But genomes sequenced on the Ion Torrent instrument appeared to contain around 10 times as many insertions and deletions, with many of the PGM-specific indels falling in homopolymeric parts of the H. pylori genome.
While the MiSeq Nextera protocol came out ahead on the read accuracy and de novo assembly side, though, the researchers found parts of the H. pylori genome were missed in assemblies comprised of MiSeq reads. On the other hand, PGM reads did capture many of those sequences, which tended to be especially rich in GC-bases.
Consequently, the PGM assemblies generally offered more accurate MLST profiles, highlighting the notion that the per-base accuracy may be less important when classifying H. pylori strains or looking at the relationships between them than it is when assembling and assessing complete H. pylori genomes.
"When comparing whole genomes and inferring strain-to-strain relationships, the errors would most likely be redundant and the correct relationship inferred," Perkins noted.
"[H]owever, for more detailed analysis of coding regions and potential pseudogenes, genes of biological interest, these errors are unlikely to be discovered by mass sequencing," he said.
The cost associated with each platform can also vary, depending on the experiments researchers have in mind.
At the time the experiments were performed, for instance, the cost of generating a million bases of sequence was lower using the MiSeq platform, despite higher per-run prices. For the H. pylori genome sequencing application, that meant that dozens of genomes had to be multiplexed on each MiSeq chip to see a price dip relative to the cost of PGM sequencing.
While it's tough to know whether the results of the H. pylori sequencing showdown will directly carry over to sequencing studies of other bacterial species, the researchers are optimistic that other groups will take advantage of some of the analytical approaches used in the current comparison when gauging the read quality and/or limitations associated with whatever sequencing platforms they select.
"[W]e hope our analyses could be utilized to study the accuracy of assemblies for other bacteria," Perkins told IS, adding, "Many groups are sequencing genome after genome with little thought of the accuracy of assemblies or missing regions. We certainly had inaccuracies and non-covered regions in our controls."
Going forward, the University of Western Australia team is eager to try to discern the extent of the variation present in H. pylori by using genome sequencing to look at everything from shared core genome sequences present across different H. pylori strains to variable regions and sequences contributing to host adaptation.
"We have a large collection of strains," Perkins noted, "and we intend to start looking for the limit of diversity of this fascinating pathogen."
The team doesn't currently plan to do additional comparisons of new sequencing technologies as they become available, since it is not based at a genome sequencing center. Nevertheless, Perkins noted that the group has taken a crack at using a MiSeq 250 base pair Nextera XT protocol, which is generating promising data so far.
For most of their genome assemblies, the researchers are currently using the CLC assembler from CLC Bio, which made it possible to pick up H. pylori aptA gene sequences missed in assemblies generated from MiSeq reads using Velvet assembly software.
Even so, Perkins noted that the group is "constantly on the lookout for better algorithms, particularly now [that] read lengths are increasing."