Researchers at the Broad Institute have developed new methods designed to more accurately define quality scores for next-generation sequencers, and to detect SNPs using next-gen data.
The scientists applied their quality score determination method, which could be used for several new sequencing platforms, to 454 Life Sciences' sequencing system and improved the quality scores provided by the vendor. Better quality values, they and others argue, could improve the accuracy of results gained from even low-coverage data, and might eventually help users decide which next-gen platform to use for a certain application.
Quality values or quality scores state the uncertainty of the data, or the likelihood that a base call is incorrect. For example, the phred algorithm assigns a quality value for each base in a Sanger read in which larger numbers designate smaller error probabilities. A Q20 value, for example, corresponds to a 1 in 100 error probability, and a Q30 value to a 1 in 1,000 error rate.
The Broad researchers used the phred algorithm to combine different error predictors for the 454 platform into a single quality score, and applied it to large training data sets for which the true DNA sequence was known. They then compared the predicted base qualities with the actual ones. Their work was published in Genome Research.
Compared with quality values provided by 454's own software, the Broad's scores are more accurate and yield more high-quality bases, according to Jared Maguire, a computational biologist who leads the subgroup for new sequencing technologies within the Broad's group for computational R&D, which published the method.
454 incorporated the Broad's quality scores as its default scores in its latest software update, which debuted in early February, according to a company spokesman. 454 also uses the quality scores in its mapping and assembly software, which is included with the instrument, resulting in "better assemblies," he says.
— Julia Karow
An international team of researchers led by Stephen Richards at Baylor College of Medicine has sequenced the genome of the red flour beetle, an agricultural pest that lives in and snacks on stored grains and dried foods. More than 100 scientists from 14 countries pitched in to sequence Tribolium castaneum. The researchers generated about 1.5 million sequence reads at 7.3-fold coverage. They assembled these into contigs totaling 152 megabases, coding for 16,404 genes, thousands of which seem to be species-specific.
Scientists from Helicos BioSciences, Ohio University, and Stanford University published a paper in Science describing the first single-molecule sequencing of a whole genome — in particular, the roughly 7,000-nucleotide genome of the M13 virus.
The Beijing Genomics Institute will add 14 next-gen sequencers: 11 Illumina Genome Analyzers and three Roche FLX instruments.
3'-O-Modified Nucleotide Reversible Terminators for Pyrosequencing
Grantee: Jingyue Ju, Columbia University
Began: Aug. 1, 2007; Ends: Jul. 31, 2009
The National Human Genome Research Institute funded Ju to design and synthesize a library of reversible nucleotide terminators to help tackle the homopolymer challenge inherent in pyrosequencing. Ju also aims to optimize the enzymatic conditions to increase the read length and accuracy of pyrosequencing with these labeled nucleotides.
Large-Scale Selection of Genomic Loci
Grantee: Thomas Albert, NimbleGen Systems
Began: Sep. 24, 2007; Ends: Aug. 31, 2009
Albert and his team are looking to develop a flexible means of genomic DNA sample preparation that will select and enrich 100,000 specific loci from the human genome that can readily be sequenced using random library sequencing pipelines. Funding comes from NHGRI, and the work is specifically targeted at the Cancer Genome Atlas and will be done in collaboration with scientists at Baylor's genome center.