NEW YORK (GenomeWeb News) – Rare, yet-undetected variants may underlie associations with common variants picked up in genome-wide association studies, according to a paper appearing online today in PLoS Biology.
Through a set of simulation experiments and a pair of GWAS, a Pennsylvania and North Carolina research team examined so-called "synthetic associations" between rare and common variants, generating data that supports the presence of these associations in the genome. Based on these results, they propose that common variants identified through some GWAS actually point to parts of the genome containing rare variants involved in disease.
"[F]or most common diseases, the common variants implicated account for only a small proportion of the genetic component," senior author David Goldstein, director of the Institute for Genome Sciences and Policy's Center for Human Genome Variation at Duke University, said in a statement. "We are now pretty sure that much of the so called 'missing heritability' lies within the huge class of relatively or very rare genetic variants which were not represented in previous studies."
Goldstein has favored a rare variant-based explanation of disease for some time, arguing that sequencing studies can help to find rare but more highly penetrant variants influencing disease risk, including HIV susceptibility.
In the new paper, he and his co-workers simulated experiments involving between 1,000 and 3,000 cases and controls and 10,000 haplotypes (with or without recombination), looking for evidence of synthetic associations — which they define as indirect associations between common variants and one or more rarer variants.
In 30 percent of the simulations, the researchers found that the presence of one or more rare variants can lead to signals of association with genome-wide significance for common variants. And when more rare variants were present, the power to detect these associated common variants also increased.
The team noted that tossing recombination into their simulations didn't negate such synthetic associations and actually seemed to boost them in some cases.
"Basically we showed that not only is it possible that rare variants are behind many of the results of recent findings, but that there are likely to be many more to be found as researchers shift their focus to methods that will find rare variants," lead author Samuel Dickson, a bioinformatician affiliated with Duke University and North Carolina State University, said in a statement.
Such patterns do not appear to be limited to simulated data alone. The researchers did a GWAS involving 194 individuals with sickle cell disease and more than 7,400 controls, all genotyped with the Illumina HumanHap 550 BeadChip.
Although sickle cell anemia is known to be caused by autosomal recessive mutations in a single gene called HBB, the team detected 179 common SNPs that reached genome-wide significance in the sickle cell GWAS.
In a GWAS of a hearing loss, a more complex genetic condition involving multiple rare mutations, the team found three significantly associated SNPs in or near a locus previously tied to the condition. In that case, they say, "rare variants at the locus create multiple independent association signals captured by common tagging SNPs."
"Ultimately, the proportion of GWAS signals that [are] due to common versus rare variants is a question that can only be resolved empirically," they wrote. "Our analyses simply illustrate that in following up GWAS signals, the possibility of synthetic associations must be taken into account."
And, Goldstein and his colleagues say, causative rare variants may not always fall close to common variants with which they share synthetic associations. In the case of sickle cell disease, for instance, they found synthetically associated common variants as far as 2.5 million bases from the causal mutation. In addition, the team's simulations suggest synthetic associations can occur over even larger distances.
Based on these findings, the researchers propose using sequencing studies to search for rare variants. But, they caution, it will likely be necessary to sequence as many as 10 million bases around GWAS signals — or even whole genomes — rather than focusing only on areas near associated SNPs.
"This tells us that we will surely need to turn to more comprehensive whole genome sequencing studies of more carefully selected subjects if we want to discover more meaningful relationships between genetic variation and disease," Goldstein said in a statement. "While such studies are undoubtedly more complex, expensive and time-consuming, we really have no choice if we want to deepen our knowledge about the genetic underpinnings of human disease."