Along-standing tenet of genome-wide association studies is that common variants are at the root of common diseases. Earlier this year, Children's Hospital of Philadelphia's Hakon Hakonarson and Duke University's David Goldstein showed through computer simulations that synthetic associations could arise and that some common variant signals detected in GWAS data are actually due to the effect of rare variants.
In a new paper in the American Journal of Human Genetics, Hakonarson and his colleagues describe an approach to finding those rare disease-causing variants. "Common variants from GWAS data could be tracking rare variants that may actually reside quite a bit away from the actual common SNP," Hakonarson says. "We present an approach to generate long-range haplotypes that go beyond the conventional haplotype structures."
Many searches for causal variants, such as through re-sequencing efforts, have come up empty, the researchers note. They say that's because rare variants come along for the ride on the same SNP tags. "The common variant is basically capturing the sum of multiple rare, heterogeneous variants at the loci and some of them fit on the same haplotypes, [while] others fit on different haplotypes," Hakonarson says. "Some of them may be risk and others may be protective, and so therefore it's the balance between the whole thing that gives you some sort of a signal."
Instead of focusing on re-sequencing, Hakonarson and his colleagues came up with another approach, thinking that by enriching their GWAS data samples with cases that are more likely to contain causal alleles, they might be able to home in on the causal variants. For this, they looked at long-range haplotypes that would contain more recently emerged variants. "With this, we are trying to enrich for cases that are notably older regions that are descended from the controls, so we have unique haplotypes on the basis that they have a mutation that the control individuals [don't] have," he says.
Using GWAS data on hearing loss — for which many causal variants are known — Hakonarson and his colleagues found that the most common causal variant, which has a minor allele frequency of 1.3 percent in the general population and 8.2 percent in some cases, had a frequency of 88 percent in their enriched cases. This, they suggest, says that sequencing selected cases could pinpoint causative alleles. "Because the individual variants are rare — most of them are going to be less than 1 percent frequency and most of the sequencing efforts have focused on sequencing 50 to 100 people — you may miss it entirely," Hakonarson says. "You have to sequence hundreds of individuals and you have to go beyond the LD block."