Phred, the base-calling algorithm developed by Phil Green and colleagues at the University of Washington in the late 90s, is the de facto standard for assessing the quality of Sanger-based sequence data. But as next-generation sequencing technologies enter the market, researchers are finding that similar methods to assign confidence to their results could use some validation of their own.
"We don't have real quality scores," said Chad Nusbaum, co-director of the Genome Sequencing and Analysis program at the Broad Institute. "People talk about having generated quality scores for these data, but none has done it in a reliable way."
Complicating the issue is the fact that each firm developing new technology to tackle the $1,000 genome will have to develop its own quality standards for its instruments. "I don't anticipate that a quality standard we develop for 454 [Life Sciences] will be directly applicable to Solexa, most especially because 454 and Solexa chemically ask different questions," Nusbaum said.
Most observers agree, however, that any quality standards developed for these sequencing platforms will have to be compatible with Phred scores. "Each new instrument will need to develop a standard that users can understand in the context of Phred quality scores," said Jeff Schloss, program director of technology development at the National Human Genome Research Institute.
454 and Solexa are both striving to remain Phred-friendly as they develop their own base-calling and quality assessment software.
"I think the companies know that the quality values are useful and so they are highly motivated to produce those."
Marcel Margulies, vice president of engineering at 454, said the company considered it "imperative that we give researchers a way of assessing the quality on the same scale" as Phred.
Likewise, Clive Brown, director of computational biology and IT at Solexa, said that the company has devised a scoring method that can be "easily converted into exactly the same quality scoring system that Phred uses."
Some in the field question how well these scoring systems map to Phred, however. Helmy Eltoukhy, a research assistant at the Stanford Genome Technology Center who has developed a base-calling method for sequencing by synthesis, noted that "some useful information from the read may be lost in the translation to a base-by-base format." [Eltoukhy's method is scheduled to be published in the Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 2006.]
The "principal challenge" in any scoring method, he said, is not assigning it to a Phred-like scale, but "finding indicators in the model that nicely correlate with what we typically consider as a confidence measure."
Quality Score and Bustard
454 wrapped up development of its software, called Quality Score, a year ago. Like Phred, the method is based on a logarithmic scale, but the similarities end there. While each base in Sanger sequencing has its own peak in the electropherogram, the confidence of a particular base call from a 454 instrument is based on whether it is part of a homopolymer.
454's platform is based on sequencing by synthesis, adding one nucleotide at a time and monitoring pyrophosphate emission to determine when a specific base is added. If, for example, there are two adenines in a row, researchers will see twice the amount of signal as they would if they had one adenine alone.
According to Gene Myers, a group leader at Janelia Farms Research Campus who is on 454's scientific advisory board, the longer the homopolymer, the more likely the error. "If there are 10 A's in the sequence in the next extension step, it is hard for the instrument to tell whether there are 10 A's or 9 A's," Myers said. "There is an internal model built on the probability of seeing a signal of a certain level based on there being 9 A's versus 10 A's, and then the Phred-like number expresses that in a natural mathematical way."
Myers added that "these numbers were developed in a way that I vouch are really mathematically reasonable, albeit significantly different from the way Phred does it for Sanger sequencing."
Solexa's sequencer works in a completely different way from 454's, so its program for assessing quality, which it calls Bustard, is also different. Solexa developed a Phred-like base-call scoring scheme for its raw read data, "except that for each base we assign a score to each of the four possibilities ... rather than one," said Brown. "This gives more information for error rate estimation, correct consensus calling, etc."
Placing Trust in the Vendor
While everyone agrees that quality standards for next-generation sequencing data are essential for scientists to embrace the new instruments, not everyone agrees these assessment tools should come from the vendors. Nusbaum noted that one reason Phred caught on was that it didn't emerge from ABI, but from the research community.
"My hope is that [quality measures] are going to come out of the community the same way [they] did with Phred," Nusbaum said. "It's best if these things grow out of the user community. I think for any kind of quality scores to have the confidence of the community, they have to be an academic enterprise."
But Green noted that times have changed since ABI first launched its capillary sequencer in the 1980s. The primary impetus for developing Phred, he noted, was that ABI's base-calling algorithm didn't offer quality scoring, and the company "didn't accept the idea that quality measures were useful." However, he added, "I sort of doubt that same scenario will play itself over again with these new machines. I think the companies know that the quality values are useful and so they are highly motivated to produce those."
Green stressed, however, that "the research community should have the opportunity to develop quality measures, and to do that, they need to get access to the raw data."
Stanford's Eltoukhy agreed that openness will be the key for vendors in this field. "I typically will trust quality scores assigned by vendors if they supply/publish significant quantities of empirical data comparing their assignments of quality scores versus actual errors in ground truth data, as is often done with the Sanger base-calling methods," he said.
So far, it seems emerging companies are trying to be as open as possible. 454 said this week that data from its instruments can now be submitted to the NCBI's Trace Archive ( see briefs, this issue), and Solexa said it consulted with the research community when it developed Bustard.
"We are developing this in conjunction with public domain and academic experts in the field in order to ensure that we have the best, most openly and widely accepted data possible," said Solexa's Brown. "In many ways we seek to encourage the processes that gave rise to the Phred phenomenon."