SAN FRANCISCO – Over the last several years, genome-wide association studies have become the primary method for identifying variations associated with human disease, but the approach has shortcomings that are leading some in the genomics community to push more aggressively into the post-GWAS era.
At Cambridge Healthtech Institute's Genomic Tools and Technologies Summit held here this week, many speakers noted that even though GWA studies have linked hundreds of common SNPs to disease, these variants account for only a very small portion of disease heritability, which has raised doubts over their clinical value. A number of talks focused on two key alternatives to GWAS: the discovery of rare variants, as opposed to common variants, with a role in disease; and an increasing focus on copy number variants rather than SNPs.
Allen Roses, director of the Deane Drug Discovery Institute at Duke University, noted that GWAS has "largely disappointed its most enthusiastic proponents" because it has not been able to identify genes responsible for complex diseases. This disappointment, he said, is because "GWAS was never meant to substitute for fine genomic sequencing," but rather to identify regions of linkage disequilibrium in the genome that warrant further study.
GWAS data on its own represents a "statistical average" of populations, and is therefore of limited value in treating individual patients, he explained.
Roses encouraged researchers to carry out more "post-GWAS experiments" to hone in on genes that may have small effects in the overall population, but interact in complex ways to impact disease risk in certain individuals. As an example, he outlined a project in which his group carried out deep sequencing of the region around the APOE gene to identify other genetic players at work in Alzheimer's disease. His team used sequencing and phylogenetic analysis to determine that variants of APOE — in combination with certain variants of the TOMM40 gene — split into two distinct risk profiles that would not be evident from looking at APOE genotypes alone.
Jay Shendure of the University of Washington described a different approach to identify rare variants based on exome sequencing. A key challenge in the wake of GWAS, he said, is that the loci implicated in these studies are often not connected to functional regions of the genome. Since the protein-coding regions of the genome are much better understood than other regions, his group is exploring the exome as a likely source for rare variants tied to disease.
Using a combination of Agilent target capture arrays and Illumina sequencing, Shendure said that his group can sequence the protein-coding regions of the genome for around 5 percent of what it would cost to sequence a whole genome. As a proof of concept, his group sequenced 12 samples — 8 HapMap individuals and 4 patients with Freeman/Sheldon syndrome — a monogenic disease caused by variation in the MYH3 gene. The goal of the study was to determine if sequencing could implicate MYH3 as the gene responsible for the disease without prior knowledge, Shendure said.
The study successfully pinpointed MYH3 as the causal gene and also identified 13,000 novel coding variants and 400 novel coding insertions and deletions, Shendure said. He noted, however, that the approach is probably most suitable for rare diseases rather than common diseases.
Su Yeon Kim, a researcher at the University of California, Berkeley, described another approach to apply next-generation sequencing to overcome the limitations of SNP arrays in association studies. Since it would be too expensive to perform whole-genome sequencing for populations in genome-wide association studies, her group is looking at pooling DNA samples in order to reduce the overall cost of such experiments.
Likewise, Nicholas Schork of the Scripps Research Institute noted that whole-genome sequencing would be the best method for identifying rare variants that collectively explain complex diseases in the population, but he warned that correlating entire genomes with phenotypes in large-scale studies is a computational challenge. He described new statistical techniques his lab is developing to address this problem, such as Generalized Analysis of Molecular Variance, or GAMOVA, which relies on sequence similarity to cluster individuals by phenotype; and an approach that uses regression to "collapse" multiple rare variants into sets based on functional features they have in common.
But sequencing wasn't the only post-GWAS technology that speakers touted at the conference. While SNP arrays may have their shortcomings, there is plenty of promise for chips that can identify copy number variants and other structural variants in the genome, speakers said.
While some have proposed next-generation sequencing as a promising approach for detecting structural variation, arrays are still the "most cost-effective method" for detecting copy number alterations in cancer samples, said Jonathan Pollack of the Stanford University School of Medicine.
Likewise, Stephen Kingsmore, president and CEO of the National Center for Genome Resources, said that arrays are still the "gold standard" for detecting CNVs. He described a human genome sequencing project underway at NCGR that was unable to identify structural variants with sequencing alone and instead relied on a combination of arrays and analysis of paired ends and BACs used for the assembly.
One reason that sequencing is limited in its ability to detect CNVs is the fact that reference genomes used for alignment are inadequate: they do not contain structural information, they have too many gaps, and they are not diploid. Because of this, several speakers suggested that short-read sequencing will not be an appropriate method for CNV analysis until routine de novo assembly is possible for a human genome.
The reference genome "is a mess," and the fact that it is haploid "reflects how naive we were with regard to copy number variation," during the Human Genome Project, said James Lupski of Baylor College of Medicine.
In response to a question from an attendee, Lupski said that next-generation sequencing is "coming around" when it comes to copy number variation, but noted that when Baylor sequenced the Watson genome with the Roche/454 platform, it used an array to identify copy number variants that sequencing did not detect.
Sequencing doesn't provide "the same gamut of the range of rearrangements" that arrays do, he said, and mapping alignments to the reference genome "masks out data that we want to see." In addition, he noted that arrays are still much cheaper for clinical use — costing on the order of $200 per patient as opposed to tens of thousands of dollars.
Nevertheless, Lupski said that efforts like the 1000 Genomes Project will likely produce valuable information that will drive improvements in the use of sequencing for CNV detection. "It's coming along," he said. "I think this will be solved."