In the early days of genome-wide association studies, it was enough to identify a few SNPs associated with a disease in a particular population. These days, though, researchers are doing more — more analysis, more studies — after they identify SNPs associated with disease. They are going beyond the SNP to drill down to find what variants actually are associated with the disease. "We don't quite know what sorts of things are causing the signals causing the association results we've got so far, and I don't think it's going to be straightforward to find many of them," says the Wellcome Trust Centre for Human Genetics' Peter Donnelly.
Though it may not be easy, researchers are having a go at trying to find actual variants — because if they do, the payoff will be big. "Imagine if we understand the mechanisms of those true associations," says Wake Forest University's Jianfeng Xu. "Our understanding of the etiology of prostate cancer and complex diseases will be much, much improved."
Zeroing in
One way to home in on causal variants is through replication, particularly in diverse populations. Cancer Research UK's Douglas Easton, a genetic epidemiologist, did just that to follow up on SNPs associated with breast cancer. These SNPs in the FGFR2 gene had been identified in women of Asian and European descent. To get a better view of the etiology of the disease, Easton and his colleagues looked to see the role of variants in this gene in breast cancer in African-American women, a population with a different history and risk of disease and, Easton says, one in which you might not expect to see the same effect. "If you can identify the variants that come up in every population, that gives you a better handle on what's likely to be the causal one," Easton says.
To identify variants near a susceptibility SNP, as Easton also did in his study, fine-mapping often comes in handy. Easton and his team evaluated eight candidate causal SNPs in African-American women. After combining the data with that from the Asian and European women and studying the chromatin conformation near those regions, they narrowed the field down to one SNP that was already known to up-regulate FGFR2 expression.
Not every SNP needs to be fine-mapped, particularly when resources are tight. The University of Western Australia's David Burgner chose 16 SNPs to fine-map in his study of Kawasaki disease, which has an unknown etiology but is thought to be caused by an infection that strikes genetically susceptible children, causing heart disease. "We chose the loci that either have a SNP in, or relatively close to, with a decent minor allele frequency that make it worth pursuing," Burgner says. Kawasaki disease is rare and, at the time, there was only a small international consortium to study it.
Wake Forest's Xu says that he had two goals for fine-mapping during his genome-wide association study of prostate cancer, namely to find which region near the SNP has the strongest association and to find additional, independent risk variants.
"The idea is that you have a signal in a particular region of the genome in an association study and you press forward to find more of the variants or ideally all of the variants that are in that region," says Oxford's Donnelly. Then you "type them or use methods like imputation to try and learn about the genotypes in the samples so you're drilling down in a much finer way to see where the signals are."
After fine-mapping 16 SNPs in a region associated with prostate cancer, Xu also used an imputing method to find other variants in the region. A lot of genetic variants are related through linkage disequilibrium, says Xu, and if they are, you don't necessarily have to measure each of those related variants directly. In his study, he was able to impute 29 genes found in the vicinity of his prostate cancer-related SNP of interest. "It's a very powerful approach," Xu says. He and his colleagues identified both the SNP with the strongest association with prostate cancer — in the microseminoprotein beta gene, or MSMB — as well as an independent variant for prostate cancer — in the gene ARA70, which is involved in androgen receptor transcription.
If fine-mapping doesn't whittle the number of markers down to one, it at least narrows the field. "Typically we've found that fine-mapping can get you down to a small handful of markers, maybe four or five, but further population studies may not be able to take you much further because basically in every population these markers are too closely correlated with one another. Then you've got to use other sort of arguments," Easton adds.
Functional analysis
To build those further arguments, functional analyses can often help explain a variant's potential for causing disease. In parallel to fine-mapping and imputing studies, Xu also did a promoter assay in a prostate cancer cell line to see how the variants he identified affected MSMB promoter activity. One particular risk allele had much lower promoter activity than the wild type. "We really cannot say, 'This is the causal one.' But I think the data so far [show] this is consistent with a causal variant at this region associated with prostate cancer risk," Xu says.
For his study of Kawasaki disease, Western Australia's Burgner followed up his fine-mapping study by looking at the expression levels of the five genes he uncovered in patients with Kawasaki disease, both from an acute stage and from a convalescent stage of the disease. "Indeed, a lot of those came up as showing differential expression," Burgner says.
He also plugged his loci into a software package from Ingenuity to determine whether there were real or putative functional relationships between them. "A number of our associated loci showed a -functional relationship," Burgner says. Out of that, he adds, a single pathway emerged that also seems biologically plausible to be affected in Kawasaki disease since it's involved in inflammation and cardiovascular homeostasis.
"My reading of this field is that … rather than looking at genes in isolation, which I think biologically doesn't make a huge amount of sense — [it's] counterintuitive — you can actually pick pathways. There are increasingly sophisticated analytical tools to look at pathways which you wouldn't necessarily have picked as a candidate pathway," Burgner says.
Naturally, though, there are problems moving into more functional assays. One such issue is knowing what systems will work, says Cancer Research UK's Easton. "We work on breast cancer, so it's maybe that the changes you are looking for are only relevant in normal breast tissue and that's not the easiest thing to work with," he says.
The GWAS future
While next-generation sequencing is breathing down its neck, array-based GWA studies still have several years of relevance left in them, experts say. According to the Wellcome Trust's Donnelly, it'll be several years before the cost of sequencing can compete with the cost of chips. In that time, scientists just may be able to incorporate even more complementary data.
Donnelly says the next big challenge for GWAS is learning how to incorporate even more variation data. Epigenetic variation, CNVs, and rare variants, he says, all play a role in disease. "I think it's clear that genome-wide association studies will need to be complemented by studies that look at … what's come to be called the missing heritability," he says.
"It's going to be an integrative approach," Burgner adds. "People are going to get clever at integrating gene expression, genome variation, copy number variation, [and] epigenetic variation."
GWA Studies on the Cheap
If you have an idea for a genome-wide association study but don't quite have the funds to do a traditional GWAS, don't despair. Université Laval's Yohan Bossé has shown that if you can survive a loss of power, you can make do with pooling your samples. Bossé embarked on comparing traditional GWA studies with a pooling-based method with his colleague Martin Derosiers because they wanted to study the genetics of chronic rhinosinusitis but weren't able to afford a large study. At the same time, Bossé was finishing up his postdoc in Tom Hudson's lab at McGill University and knew that his fledging lab wouldn't be as well-funded. "I went looking to be able to apply genomic research but cheaper," Bossé says.
For his study, Bossé looked at more than 550,000 SNPs from his cohorts of patients with type 2 diabetes and chronic rhinosinusitis. Instead of genotyping each individual on a single array — which he says can get very expensive at about $1,000 per chip — Bossé pooled all the cases together and all the controls together and put those onto the arrays to try to estimate allele frequency. "The nature of your raw data [is] totally different," Bossé says. Using various statistical methods, he then ranked the SNPs identified. With this method, Bossé was able to identify SNPs associated with type 2 diabetes and chronic rhinosinusitis. For the type 2 diabetes cohort, Bossé used the same DNA from the 2007 Nature study from Sladek et al. that identified risk alleles through a traditional genome-wide association study and compared the pooling-based method to the 2007 results. "We were really, really excited about the results. First of all, the major gene associated with type 2 diabetes in this paper was TCF7L2, which is a well-known gene in type 2 diabetes," Bossé says. "The SNP with the strongest association signal in the paper in Nature was ranked number one in our pooling experiment."
For the chronic rhinosinusitis cohort, Bossé validated the top hits from the pooling study — which were within 10 kilobases of known genes — by genotyping 1,536 individual SNPs. He found that 41 percent of those SNPs were assoc-iated with chronic rhinosinusitis. "You will only do the individual genotyping on your best SNP," he says.
Bossé adds that pooling data will be useful as long as the cost of GWA studies remains high. He estimates that instead of costing millions of dollars, this method will only cost tens of thousands of dollars.