NEW YORK (GenomeWeb) – A recent analysis bolsters the notion that whole-genome sequencing has an edge over whole-exome sequencing in coverage uniformity and depth needed to accurately call variants, making it possible to pick up on the presence of heterozygous SNPs with fewer reads per site.
Authors of the study argued that the whole-genome approach to finding genetic variants in coding sequences could be close to cost comparability with exome sequencing using the newest Illumina sequencing instrument — consistent with a growing migration towards large-scale sequencing efforts with a full genome focus.
As they reported in BMC Bioinformatics in July, researchers from the University of Edinburgh MRC Institute for Genetics and Molecular Medicine generated new exome sequences for more than a dozen human samples. They also used existing exome and genome sequence data from the Cancer Genome Atlas and 1000 Genomes Project to compare the sequence biases and depth of coverage needed to identify variants at different sensitivity levels with each approach.
The work was done as part of a follow-up to an analysis of SNP detection in exome sequence data by members of the same team, which was described in another BMC Bioinformatics paper last summer.
"We did a basic quick and dirty analysis on a couple whole-genome samples we had access to and found that there was a fairly significant difference [compared to exomes]," the study's first author Alison Meynert, a researcher based at the University of Edinburgh MRC Institute for Genetics and Molecular Medicine, told In Sequence.
"So we developed that further to look at more samples, specifically with the Cancer Genome Atlas samples, where we are able to have both exome and genome samples from individuals," she added.
The capture kits used for the analysis were updated compared to those considered in the team's earlier exome sequencing analysis.
Though the newer kits offered an improvement over those tested in the past, Meynert noted that the researchers detected the same general patterns described previously, including more uniform sequence coverage in the whole-genome sequences and a need for deeper coverage depth when evaluating heterozygous variants in exome sequence data.
"The coverage is better," she said, "but there's still this issue of some targets being very highly covered and others being rarely covered or not covered at all."
For their analysis, she and her colleagues looked at 10 TCGA samples assessed by both whole-genome sequencing and deep whole-exome sequencing. Ten more TCGA samples had been tested using exome sequencing alone.
The group also had access to whole-genome sequence data for six samples tested for the 1000 Genomes Project, along with 13 exomes sequenced at the University of Edinburgh.
For the latter samples, the investigators grabbed protein-coding portions of the genome with Nimblegen's SeqCap EZ Exome v3.0 before sequencing the exomes with 98-base-pair, paired-end reads on Illumina's HiSeq 2000 platform.
The exome sequences obtained from TCGA had been captured using an Agilent exome capture kit developed for the TCGA project, they noted, and sequenced using shorter, paired-end Illumina reads.
As in the prior study, the researchers focused on a "gold standard" set of SNPs described in version 3.3 of the International HapMap project database. They also scrutinized SNPs present in a well-characterized "Genome in a Bottle" sample from the 1000 Genomes Project, for which high confidence had been described.
Consistent with previous findings, the team saw that lower coverage was needed to call heterozygous variants in whole-genome sequence data than in whole exome sequences, apparently due to biases introduced during exome capture and amplification steps.
Such biases can lead to better representation of some bases than others in exome sequences, they explained, and tend to decrease the likelihood that a non-reference allele will be accurately identified at heterozygous SNP sites.
In contrast, the team demonstrated that it's possible to dial down sequence coverage in whole-genome sequence experiments and still detect variants as reliably as in exome-sequencing experiments done at greater coverage depths.
"Because the variability in coverage for whole genomes is very, very small … you don't have to do as much average sequencing depth to get the same level of accuracy," Meynert said.
When they considered whole-genome and whole-exome sequences for the matched TCGA samples, for example, the researchers saw roughly similar results from both methods. When differences did arise, they typically involved sites called as homozygous in the exome sequences and heterozygous in the genome sequences.
In the TCGA samples, the team calculated that a minimum per-site depth of 34-fold would be needed to detect 95 percent of the coding variants sought in the analysis — a feat that could be accomplished with 12-fold whole-genome sequence coverage.
Given such differences in read depth requirements, the team estimated that it should be possible to detect SNPs across protein-coding sequences for roughly the same cost using either whole-genome or whole-exome sequences in the near future, based on projected sequencing costs on the Illumina HiSeq X Ten.
With the HiSeq 2000, whole-exomes are still four to five times cheaper than whole-genome sequences, Meynert and her colleagues reported.
Whole-genome sequencing also carries steeper bioinformatics and storage costs, given the massive amount of data it generates.
"We often forget to budget in the price of bioinformatics and data storage," Emily Farrow, a pediatrics researcher at the University of Missouri-Kansas City School of Medicine who was not involved in the current study, told IS in an email message. "So, while a genome is getting cheaper, it is still more expensive than an exome."
"There are certainly still downstream costs," Meynert agreed. "It's 10 to 15 times the amount of storage space that you're using, but you're getting 50 to 80 times the amount of information … depending on which exome kit you're comparing."
Along with variants in protein-coding portions of the genome, for example, whole genome sequences offer a more complete look at structural variation in the genome as well as alterations in non-coding sequences that may regulate genome function.
Moreover, Meynert noted that the relatively uniform coverage available in genome sequences provides an advantage when comparing tumor genomes with matched normal samples, making whole-genome sequencing especially attractive as a cancer genomics tool.
"For things like big cancer projects, it's very forward-looking to go for genomes from the start," she said.
Indeed, whole genomes are increasingly being sought by large-scale sequencing efforts done in the research realm or more clinical settings.
A Department of Health-inspired initiative in the UK known as Genomics England aims to sequence 100,000 genomes in an effort to tackle everything from cancer to rare conditions and infectious disease, for example.
Last week, UK Prime Minister David Cameron announced $523 million in funding for the project, which includes support specifically earmarked for genome sequencing services, infrastructure, and know-how from Illumina, a partner on the project.
Whole-genome sequencing also edges out exome sequencing when rapid results are needed, according to Farrow, who manages operations at Children's Mercy Hospital's Center for Pediatric Genome Medicine.
That center has implemented a clinical sequencing pipeline known as STAT-seq, which aims to provide information on undiagnosed genetic conditions in infants in the center's neonatal intensive care unit in 50 hours or less. The tight timeline leaves little time for sample preparation, Farrow noted, making it unrealistic to consider an exome capture step.
"Currently the only way to do rapid sequencing with results in 50 hours is to do whole genome sequencing," she said.
"From a sample preparation standpoint, genomes are preferable," Farrow continued. "They are quicker, and we use a PCR-free sample prep, which helps to decrease any bias you might see from PCR amplification."
Even so, there are applications where exome sequencing clearly wins out, Meynert said. She noted that the method "does still have a role to play for some time," particularly amongst those based at sequencing centers without access to the HiSeq X Ten.
Moreover, because the X Ten is specifically earmarked for human genome sequencing at the moment, those working with model organisms are limited to other, pricier platforms, she added.
"Given that the cost of exome kits is going to continue to come down as well, I think it's going to be a few years before we see whole genome [sequencing] entirely replace whole exome [sequencing]," Meynert said.
For her part, Farrow pointed out that the same price declines associated with genome sequencing with the X Ten should also diminish the price tag for whole-exome sequencing.
She noted that ongoing improvements in exome capture kit probe design and optimization should continue to bolster the coverage uniformity that can be achieved by whole-exome sequencing, bringing it closer to the evenness achieved by whole-genome sequencing.
Indeed, Meynert cautioned that the current analysis "does not take into account new, disruptive technologies" that may arise down the road, either in the exome capture or sequencing platform arena.