Over the last six months, exome sequencing has gained ground as a relatively low-cost method to identify causative mutations for rare diseases, but some in the scientific community are questioning the staying power of the approach as the cost of whole-genome sequencing plunges ever closer to the $1,000 mark.
At the Future of Genomic Medicine Conference hosted by the Scripps Translational Science Institute earlier this month, there was some debate among participants and speakers over the relative value of the two approaches.
As several speakers noted, while exome sequencing allows researchers to analyze 20 to 30 human genomes for the price of one whole genome, there is the risk that the approach may miss key variations in non-coding regions. Most agreed that the cost of whole-genome sequencing will eventually reach a point that makes exome sequencing obsolete, but there was little agreement over what that crossover price might be, or when it will be attainable.
Lee Hood, president of the Institute for Systems Biology, came out strongly in favor of whole-genome sequencing in his discussion of an ISB-led study published last week that identified the causative gene for Miller syndrome by sequencing the whole genomes of four family members (see story, this issue).
Hood noted that the approach allowed the ISB team to detect crossover sites and haplotypes, "which you can't do with exome sequencing." In addition, he said, the family-based whole-genome study enabled his team to identify intergenerational mutation rates.
"Exome sequencing is very powerful, but a lot of diseases are not encoded in coding regions of the genome," he said. He added that he has "always been skeptical" of large-scale genomics efforts like the Cancer Genome Atlas that initially took a targeted approach to genome analysis.
While the data from TCGA pilot studies published to date are "useful," Hood noted that the TCGA glioblastoma study, published in late 2008 (see In Sequence 9/9/2008), "didn't tell us anything we didn't already know."
On the other hand, several participants pointed out that the ISB study implicated the very same gene that a group at the University of Washington recently identified via whole-exome sequencing (see In Sequence 9/29/2009).
"They both got the same answer," Eric Topol, director of the Scripps Translational Science Institute and a conference organizer, told In Sequence. "The exome study picked up the cause of Miller syndrome at a fraction of the cost" of whole-genome sequencing.
Topol added, however, that the ISB paper "certainly had lots of other contributions besides the Miller syndrome gene. It talked about the de novo mutation rate, which is a nice contribution. It provides a beautiful recombination map of the kids, and the family. So it had a lot of other things that you can't get out of exomes. But the exomes got the answer for what was causing the rare disease."
Topol noted that this scenario underscored "the major debate at the meeting, which was, 'Is exome sequencing going to work?' We know it's much more cost effective, but is it going to turn out to be useful, compared with whole-genome sequencing, if the price continues to come down?"
[ pagebreak ]
The Value of 1 Percent
Sam Levy, director of genome sciences at the Scripps Translational Science Institute, described a head-to-head comparison of exome and whole-genome sequencing of the same sample in an effort to see how easy it would be to "recapitulate" the results of the two approaches.
Levy was director of human genomics at the J. Craig Venter Institute before joining Scripps late last year, and was on the team that sequenced Venter's genome using Sanger technology in 2007. He said it was fairly easy to convince Venter to contribute another sample in order to see how well exome sequencing — in this case, Roche NimbleGen exome capture combined with 454 sequencing — stacked up to the gold standard.
After analyzing the data, Levy said that there was only about 90 percent concordance between the coding regions of the two data sets. While there were some cases where exome sequencing missed variants found in the whole-genome data, exome sequencing also identified many variants that Sanger sequencing missed. In fact, the team found 2,000 new protein-coding variants via exome sequencing that were predicted to alter protein function in 192 genes involved in 11 different diseases.
Levy told In Sequence last week that the cases where exome sequencing overlooked variants were generally due to "the probes not being as effective in capturing these regions, so therefore the depth of coverage in these regions wasn't sufficient."
As for the variants that Sanger sequencing missed, "we were not surprised by that because we knew that the criteria we applied in detecting variants for the whole-genome shotgun Sanger approach were quite stringent, so we knew we would miss some number because we wanted to keep our false positives low," Levy said.
In his talk, Levy said that about 35 percent of the variants that were detected only by exome sequencing were actually in the original Sanger data set, but were filtered out due to lack of evidence.
"When you target 1 percent of the genome, you end up increasing overall your coverage of those regions relative to everything else," which ultimately increases the number of variants detected, Levy said.
On the other hand, he said, "you're relying on your probe set to be accurate and complete, to behave in a consistent manner across all the regions in all the genes in the genome, and that doesn't always work, so you miss stuff."
Levy said that one benefit of running the comparison on the same DNA sample was that it enabled his team to gain insight into the inherent biases in the two approaches and modify their bioinformatics methods accordingly "so that they can optimize the ability to call variants, and essentially reduce the false positive rate while trying to ensure that the false negative rate is not too high."
In addition, he said the study provided insight into the minimal coverage requirement for exome sequencing. "That helps us determine what the cost/benefit is for a particular experiment — how many samples can we do at a given cost? And when we target a certain level of coverage, what kind of accuracy can we expect to achieve, and what can we expect to miss?"
Topol said that cost is a "driving feature" in the decision to opt for exome sequencing over whole-genome sequencing, but he noted that "the cost is much more than just the cost of the sequencing, but also the analytical side. There's also storage of all this data, which, as you're getting billions of bytes for each person, can really add up."
Levy agreed that "it's more than just the issue of cost. It's the issue of how you analyze the data," but acknowledged that "there is a price point below which people will stop doing exome sequencing."
[ pagebreak ]
Without pegging a number on what that price may be, Levy said that the "limiting factor" is the cost of the sequence-capture probes. "If whole-genome sequencing becomes cheaper than the combined cost to sequence an exome plus the probes that capture it, then people will do whole-genome sequencing," he said.
Target-capture pricing can vary, but it is still considerably lower than sequencing costs. An Agilent spokesman said that pricing for its SureSelect sequence-capture technology can range from $1,500 per exome for five reactions to as low as $300 per exome for 10,000 reactions.
"I can see in the foreseeable future that the cost to sequence an exome is going to be a fraction of the cost to actually buy the probes to capture the regions to sequence," Levy said. "So once that point is passed, and once sequencing costs are driven down further, people, I think, will switch over … but it's not clear how long it's going to take."
Levy added that exome sequencing isn't foolproof. For example, he questioned whether the probe sets that vendors provide for exome sequence capture are as complete as they should be. "We've found when looking at cancer samples that certain genes are actually missing that you might identify as an important cancer gene from what one defines as an exome," he said.
In addition, "we've only seen instances where exome sequencing has succeeded. We haven't seen any examples of where one didn't learn anything," he said. "Certainly people don't publish negative results in general, but it would certainly benefit us if we did know that."
Furthermore, while exome sequencing has been a valuable tool for identifying causative mutations in rare, Mendelian disorders, the jury is still out on whether the approach will be as effective for complex disorders like heart disease.
Efforts are underway to gather that data. Debbie Nickerson of the University of Washington is leading a project funded by the National Heart, Lung, and Blood Institute to sequence 7,000 exomes to study heart, blood, and lung diseases in "well phenotyped" cohorts. "We're looking to Mendelianize complex diseases," Nickerson said at the Scripps meeting.
She noted that while this work is only now getting underway, it appears that these diseases will require many more samples than the handful that were sufficient to identify causative mutations in Miller syndrome and Freeman-Sheldon syndrome.
Nickerson added that the exome sequencing studies that have been published to date were able to use dbSNP as a "filter" to narrow down the set of potentially disease-related variants to a manageable number of novel variants. As more and more sequence data is added to that database, however — including deleterious variants — it will start to lose value as a filter. In response, her lab is developing an approach based on evolutionary conservation in order to pinpoint causal variants.
The method assumes that genomic regions that are highly conserved play a larger functional role than sites that are not as highly conserved, and calculates a "rejected substitution" score for each position in the genome to rank the "impact" of genes. She said that her group has used the method to analyze the data sets for the Miller syndrome and Freeman-Sheldon syndrome exome-sequencing studies, and it arrived at the same results as the dbSNP-based approach.
In addition, Nickerson said that it will be helpful to acquire more "wildtype" exomes that can serve as "a critical filter for future variation studies."
Topol said that while the debate at the conference highlighted many challenges that lie ahead, "what's nice to see is that there are choices right now."
Exome sequencing "is much more practical because of the cost and the analytical side, and it is getting the answers, so it's a very reasonable thing," he said. "Whether that's going to hold up or if it's going to be a way station, that remains to be seen, but I think exome sequencing is going to be around for a few years."