Both Keith Robison at Omics! Omics! and Dan Koboldt at Mass Genomics have been looking over data produced using Ion Torrent's 316 chip. Robison notes that the data set, a sequencing run of E. coli DH10B, contains 1.69 million reads, 1.53 million of which were 50 base pairs or longer. "No enormous gains in read length in this dataset, though the curve might be shifted towards longer," he writes, adding that the Broad Institute has been getting more than 2 million reads from its 316 chips. In addition, Koboldt calculated the average base quality of the first 1,600 reads. "You'll notice that, like early Illumina/Solexa data, average base quality declines along the length of the read," he says, noting that that these are variable-length reads. "Thus, it's possible that only the last few bases in each read are low-quality, which reduces the average score as you reach the end of the reads." He also found that substitutions were more likely to occur near the ends of the reads. "The 454-like homopolymer issue raises some concerns about using this platform for variant discovery," Koboldt adds. "Yet despite the errors, I'm impressed at how rapidly the technology has matured."
Some Data to Play With
Jun 23, 2011