By Monica Heger
Life Technologies' Ion Torrent has released a paired-end sequencing protocol for its benchtop PGM sequencer, which it says helps to improve accuracy, but only when used in conjunction with an assembly algorithm developed by SoftGenetics.
Ion Torrent has released the protocol, along with several internally generated datasets and the SoftGenetics algorithm, called Floton, on its website.
According to Maneesh Jain, Ion Torrent's vice president of marketing and business development, the protocol is relatively simple, although it does require the user to remove the chip following the forward sequencing run, perform several enzymatic steps, and then put the chip back on the machine to run in the reverse direction.
The company has released datasets from two internal studies that used the new protocol to sequence an Escherichia coli strain. In the first dataset, the company used its 314 chip and 2x100 paired-end reads, generating a total of 69.7 megabase pairs of data with a quality score of 20 or above. About 90.5 percent of the reads were paired. For a 2x100 run on the 316 chip, the company produced significantly more data, generating 440 megabases on the forward run and 386 megabases on the reverse run.
The company then used the SoftGenetics Floton algorithm to compute raw read accuracy from the merged runs, which it found to be 99.83 percent on the 314 chip and 99.86 percent on the 316 chip.
Using single-end reads, Ion Torrent typically generates raw read accuracy rates between 99 percent and 99.5 percent, said Mike Lelivelt, director of bioinformatics. Accuracy improvements with paired-end sequencing were only realized when the company used SoftGenetics' algorithm, he added.
However, the company plans to release its own analysis tool in the second quarter of 2012.
The SoftGenetics algorithm also helps push the data quality, which had been hovering around Q25 for single-end sequencing, to around Q30 for paired-end sequencing, Lelivelt said.
Additionally, paired-end sequencing enables users to use inserts and makes mapping and assembly easier, he said.
The paired-end approach is only slightly different than Ion's standard sequencing run. First, a modified primer is used in the library preparation step, which enables the user to prepare the template for the reverse read following the forward sequencing run.
After the initial sequencing run, the user removes the chip from the machine and performs a series of enzymatic steps to prepare the template for the reverse run. First, the forward primer is extended fully to the bead. Then, an enzyme cleaves the original template and degrades it to produce the primer for the reverse read. The chip is then loaded back onto the machine, and the read extends away from the bead, in the reverse direction, toward where the forward read originally started.
The chip is off the machine for about one hour, although hands-on time is only several minutes, Jain said.
After the forward and reverse runs are complete, a plugin on the Ion Torrent server will merge the runs and also generate three files — one from the forward run, one from the reverse run, and one that combines the two, said Lelivelt. Those files then "flow right into third-party software" for analysis, variant calling, and assembly, he said.
The key to the SoftGenetics' Floton assembly algorithm is its ability to improve raw read accuracy at the 5' end of the read, said John Fosnacht, vice president and co-founder of the company.
With all sequencing technologies, error rates increase toward the end of a read. Paired-end sequencing helps improve accuracy at the 5' end because reads are sequenced in both the forward and reverse directions. Not only is there more data, and subsequently more coverage of each base, but when the chip is run in the reverse direction, the 5' end is sequenced first.
Fosnacht said the Floton algorithm also helps correct for homopolymer errors. He noted that indel errors, which are more common on the PGM, are more problematic for assembly than substitution errors, but the Floton assembler is able to treat the indels as substitution errors, which helps correct most of the homopolymer errors.
SoftGenetics has released an application note on the Ion Torrent website demonstrating the method on E. coli sequence data from the 314, 316, and 318 chips.
On the 316 chip, for instance, a 2x100 base paired-end run generated over 1.8 million assembled reads, with a raw read accuracy of 99.8 percent. Those reads were assembled into 321 contigs with an N50 of 57,096 bases.
The contigs were then aligned to the E. coli reference, which demonstrated that 86.6 percent of the reference was covered by at least one contig. Repetitive regions are still difficult to assemble, however, and the company found that around 7 percent of the genome, including a large tandem duplication, was not covered.
Have topics you'd like to see covered on In Sequence? Contact the editor at mheger [at] genomeweb [.] com.