Performing whole-genome sequencing on a smaller number of individuals, as opposed to performing larger genome-wide association studies on genotyping arrays, will help to identify the rare variants that cause common diseases, according to researchers at Duke University Medical Center.
While they acknowledge that genome-wide association studies have been effective in finding common variants associated with human disease, the researchers caution in a new PLoS Biology paper that rare, undetected variants may underlie those associations.
Specifically, through a set of simulation experiments and a pair of GWA studies, the Duke researchers, together with colleagues at the Children's Hospital of Philadelphia, examined so-called "synthetic associations" between rare and common variants and generated data that supports the presence of these associations in the genome.
Based on these results, they propose that common variants identified through some GWA studies actually point to parts of the genome containing rare variants involved in disease, and show that those rare variants may be located far from the common variants.
Finding the missing heritability of diseases, therefore, will require a smaller, sequencing-based approach, rather than using a GWAS to locate a particular genomic region of interest, and then using targeted sequencing to locate the rare variants.
"GWA studies are great and effective on the first pass, but after that you are getting all the low-hanging fruit of any value with the first couple rounds of GWAS for any disease," Sam Dickson, the lead author on the new paper and a bioinformatician at the Institute for Genome Sciences and Policy's Center for Human Genome Variation at Duke University, told BioArray News this week.
"We have never done so well, we are explaining more of diseases that have ever been explained before," Dickson said of GWAS. "But that means that we can now explain 5 percent instead of 1 percent when we would like to explain 100 percent of what is going on. We are working, we are progressing, but everyone knows GWAS will hit a wall."
The approach "will not be able to explain everything," and centers like IGSP are "gearing up for exploring the next uncharted territory that will help us explain more," he added.
According to David Goldstein, director of the IGSP and a senior author on the paper, he and fellow researchers are "pretty sure that much of the so called 'missing heritability' lies within the huge class of relatively or very rare genetic variants which were not represented in previous [GWAS] studies."
Goldstein favors a rare variant-based explanation of disease, arguing that sequencing studies can help to find rare but more highly penetrant variants influencing disease risk, including HIV susceptibility.
In a New England Journal of Medicine article published last year, he made a similar case for conducting smaller, sequencing-based studies, as opposed to large, array-based studies, to identify the rare variants that cause common human diseases (see BAN 4/21/2009).
In the new paper, Goldstein, Dickson, and his colleagues simulated experiments involving between 1,000 and 3,000 cases and controls and 10,000 haplotypes with or without recombination, looking for evidence of synthetic associations, which they define as indirect associations between common variants and one or more rarer variants.
[ pagebreak ]
"A synthetic association has different properties," Dickson told BioArray News. "It is association with a common variant because of a rare variant and an association that might cover a much wider region. We don't expect it's a single rare variant that might create this rare association with a common variant.
"It would be several rare variants that would work in tandem to create an association among common variants," he added. "Because there are multiple rare variants, they would create an association with a common variant."
In 30 percent of the simulations in the PLoS Biology study, the researchers found that the presence of one or more rare variants can lead to signals of association with genome-wide significance for common variants. And when more rare variants were present, the power to detect these associated common variants also increased.
The team noted that tossing recombination into their simulations didn't negate such synthetic associations and actually seemed to boost them in some cases.
"Basically we showed that not only is it possible that rare variants are behind many of the results of recent findings, but that there are likely to be many more to be found as researchers shift their focus to methods that will find rare variants," Dickson said.
Such patterns do not appear to be limited to simulated data alone. The researchers did a GWAS involving 194 individuals with sickle cell disease and more than 7,400 controls, all genotyped with the Illumina HumanHap550 BeadChip. Although sickle cell anemia is known to be caused by autosomal recessive mutations in a single gene called HBB, the team detected 179 common SNPs that reached genome-wide significance in the sickle cell GWAS.
In a GWAS of a hearing loss, a more complex genetic condition involving multiple rare mutations, the team found three significantly associated SNPs in or near a locus previously tied to the condition. In that case, the authors wrote in the paper that, "rare variants at the locus create multiple independent association signals captured by common tagging SNPs.
"Ultimately, the proportion of GWAS signals that [are] due to common versus rare variants is a question that can only be resolved empirically," the authors wrote. "Our analyses simply illustrate that in following up GWAS signals, the possibility of synthetic associations must be taken into account."
"The point of paper is not to say, 'This is how nature works,'" Dickson said. "We wanted to show how it is possible that rare variants work together to create associations with common variants. It's likely that you have some form of this going on."
Additionally, causative rare variants may not always fall close to common variants with which they share synthetic associations, according to the authors. In the case of sickle cell disease, for instance, they found synthetically associated common variants as far as 2.5 million bases from the causal mutation. In addition, the team's simulations suggest synthetic associations can occur over even larger distances.
Based on these findings, the researchers proposed using sequencing studies to search for rare variants. Still, they warned it will likely be necessary to sequence as many as 10 million bases around GWAS signals — or even whole genomes — rather than focusing only on areas near associated SNPs.
"This tells us that we will surely need to turn to more comprehensive whole-genome sequencing studies of more carefully selected subjects if we want to discover more meaningful relationships between genetic variation and disease," Goldstein said in a statement. "While such studies are undoubtedly more complex, expensive and time-consuming, we really have no choice if we want to deepen our knowledge about the genetic underpinnings of human disease."
Companies that make whole-genome genotyping arrays for GWAS, though, are currently preparing new generations of chips that contain rare variants selected from sources like the 1000 Genomes Project. Illumina CEO Jay Flatley has said several times in recent weeks that the firm believes that a new round of large, array-based "rich" GWAS will commence as those new products come on line (see BAN 1/19/2010).
"I expect they’ll find many novel associations with those chips," Dickson said of the next-generation arrays. "Adding more markers will get you more results, but you'll also miss a lot of what you are looking for. It’s a tough problem and there is no solution for it.
"My personal recommendation is that studies with larger sample sizes will have a limited ability to locate novel associations and that more emphasis should be place on things like sequencing that will help us locate potentially rare variants," he added.