Researchers from BGI have analyzed the advantages and disadvantages of each major sequencing platform, assessing features such as cost, accuracy, throughput, read length, and the applications each are suited to.
The institution, which boasts the largest next-generation sequencing capacity in the world, employs 137 Illumina HiSeq 2000 instruments, 27 Life Technologies SOLiD 4, one Ion Torrent PGM, one MiSeq, and one Roche 454 GS FLX sequencer.
The BGI researchers analyzed different features of each of these systems and published their findings last month in the Journal of Biomedicine and Biotechnology.
The comparison is one of several independent platform comparisons that have been published recently as researchers attempt to clarify the differences between systems in the quickly evolving next-generation sequencing field. Last month, for instance, the Wellcome Trust Sanger Institute published a comparison of the Ion Torrent PGM, Illumina MiSeq, and Pacific Biosciences RS machines (IS 7/31/2012). And in April, researchers from the University of Birmingham and elsewhere published a comparison of the GS Junior, PGM, and MiSeq (IS 4/24/2012).
The BGI study represents the first to look at all commercially available sequencers — both high-throughput systems and desktop machines. The researchers first compared the HiSeq, SOLiD, and 454 systems, then turned their attention to the PGM and MiSeq, and also took a look at data from a Pacific Biosciences RS.
Among high-throughput systems, the BGI team concluded that the HiSeq features the biggest output and lowest reagent cost; the SOLiD system has the highest accuracy; and the 454 the longest read lengths. The 454 also has the quickest run time of 10 hours.
While the 454 system has the longest reads — 700 base pairs using the Titanium chemistry — and a run time of only 20 hours, "the high cost of reagents remains a challenge," the authors wrote. Counting reagent use only, they estimated that sequencing with the 454 cost $10 per megabase, compared to $0.13 per megabase with the SOLiD system and $0.07 per megabase on the HiSeq.
Additionally, the 454 has a "relatively high error rate in terms of poly-bases longer than 6 bp," the authors wrote.
The team found that the SOLiD system had the highest raw data accuracy, at 99.94 percent, compared to 98 percent on the HiSeq and 99.9 percent on the 454.
However, they noted that the SOLiD is not suited for de novo whole-genome assembly, and has both shorter read lengths and produces less data than the HiSeq — about 120 gigabases compared to 600 gigabases on the HiSeq. Additionally, one run on the SOLiD takes seven days for single-end sequencing and 14 days for paired-end sequencing, compared to the HiSeq, which finishes a run in three to 10 days, depending on the length of reads and whether it is single- or paired-end. A run on the 454 takes around 24 hours.
Of the three, the BGI team found that the HiSeq was the most flexible due to its ability to design runs with varying read lengths, to do both paired-end and single-end sequencing, and its ability to multiplex many samples.
Lin Liu, the lead author of the BGI paper, told In Sequence that BGI continues to use all the different platforms. While the majority of its sequencing throughput is produced on the HiSeq, it uses the SOLiD for human resequencing projects and the 454 to "enhance our long read capacity" and for applications such as ribosomal RNA identification.
BGI also plans to upgrade some of its HiSeq 2000s to the HiSeq 2500, which will enable a whole human genome to be sequenced in about one day.
The BGI team also looked at both the PGM and the MiSeq platforms. At the time the study was completed, BGI was operating just one each of the PGM and MiSeq, but since publishing, BGI has purchased "several benchtop sequencers" and is "planning to purchase more in the near future," said Liu. "Each platform has its own advantages, so both of them have been adopted by BGI."
In the study, the researchers first compared the PGM to the HiSeq in terms of sequencing quality and mapping rate.
To do this, they sequenced a Rhodobacter sample with high GC content on both machines, as well as an Escherichia coli sample.
Using the 314 chip and 200-base-pair reads, the team compared mapping alignment characteristics between the PGM and HiSeq for the Rhodobacter. It found the PGM had an average mismatch rate of 0.338 percent, lower than the HiSeq's rate of 1.004 percent, but had a higher insertion and deletion rate at 0.693 percent and 0.965 percent respectively, compared to 0.009 percent and 0.003 percent for insertions and deletions, respectively, on the HiSeq.
Additionally, the team noted that read quality on the PGM is "more stable," while the quality tends to decrease on the HiSeq after about 50 cycles, "which may be caused by the decay of fluorescent signal with increasing the read length," the authors wrote.
Mike Lelivelt, director of bioinformatics at Ion Torrent, said that read stability is one of the advantages of PGM over the Illumina systems and others that rely on fluorescent detection systems.
The fact that BGI, "the world's largest runner of the HiSeq," has also demonstrated this difference in read stability, said Lelivelt, just adds to the credibility of the claim. "No one can argue that they don't know how the HiSeq runs," he said.
He also noted that as with all peer-reviewed studies that compare sequencing platforms, because of the lag time to publication, the comparisons are not made using the latest technology. For instance, he said, PGM kits and reagents that were launched in May would enable a mean raw accuracy of 99.6 percent with read lengths of 255 base pairs.
The paper is a "first glimpse into us being competitive from a data quality level," said Lelivelt.
The BGI team added that many metrics of the systems were "incomparable" due to their "different sequencing principles."
Lelivelt agreed, and said that trying to compare the different systems will not always provide a clear answer.
Illumina declined to comment on the study.
Evaluating the MiSeq against the HiSeq, both of which use the same sequencing-by-synthesis chemistry, the BGI team found that due to the longer read lengths of the MiSeq compared to the Hiseq, the sequence data was "better in contig assembly compared with HiSeq."
Compared to the PGM, one of MiSeq's advantages is that it requires less sample input — nanograms compared to micrograms. The authors also cite the MiSeq's flexibility as an advantage — a user can vary read lengths from 36 base pairs to 150 base pairs and do either single- or paired-end sequencing, to enable runs to be completed in three to 27 hours. A sequencing run on the PGM, meantime, lasts around two hours for a 200 base run.
Although BGI does not currently own a PacBio RS machine, and Liu said that the institute is still considering whether to purchase one, the authors discuss a sequencing run they completed on the system.
The team ran a de novo assembly of a DNA fosmid sample using chemistry prior to full commercial release. The team created a 7,500 kilobase insert library and achieved average reads of 2,566 base pairs. The researchers sequenced to 324-fold coverage, obtaining a mean read score of 0.861 and accuracy of 99.95 percent.
PacBio's vice president of strategic marketing, Sejal Sheth, told In Sequence via e-mail that the review "highlights how useful our technology is for microbiology research."
Since the study was done the company has launched two enhancements to its chemistry and software, including the C2 launch, which "typically provides average read lengths greater than 3,000 base pairs, a consensus accuracy of Q50 with 25x coverage, and a 10x increase in throughput per SMRT cell," Sheth said.
According to the BGI authors, the advantages of the machine are its fast sample prep of four to six hours, the lack of an amplification step, long reads, and fast turnaround time.
The authors note that the technology could be "useful for clinical laboratories, especially for microbiology research," as well as epigenetic studies.