In an era of rapidly increasing throughput, the path to genome-wide association studies is obvious — after all, once you get to full-genome capabilities, what's next but to apply that to larger and larger populations? One thing that wasn't so clear, though, was how these studies would become so immediately the de facto standard of how disease linkage should be done. Just in the past year, genome-wide association studies have been published in unprecedented population sizes — the Wellcome Trust's consortium approach looked at seven diseases across 14,000 people with a separate 3,000 controls — and for any number of diseases, from schizophrenia to diabetes to Parkinson's.
The reason for the soaring interest is simple: these studies have provided the first clues to the triggers of common disease, which have proven intractable with the kind of linkage disequilibrium approach that has found success in rare disease. "What you're getting from the genome-wide association study is a correlation between changes in DNA and changes in a disease state like obesity," says Eric Schadt, executive scientific director of research genetics for Merck's Rosetta Inpharmatics subsidiary. "You get the signpost from the DNA that says there's something here that's causing disease."
Even though it's still early days for these projects, scientists are evaluating ways to improve the GWAS concept by adding different types of data or rethinking analytical procedures. Even the Microarray Quality Control project has launched a working group specifically targeted to these kinds of association studies, and members are already knee-deep in sample data sets. A number of longer-term challenges remain, however; making these studies more predictive and getting from GWAS results to a functional variant will take more time to solve.
But in the meantime, experts say that the excitement surrounding these projects is well deserved. Teri Manolio, director of the Office of Population Genomics at NHGRI, says that GWAS results are revealing completely new ideas for how genomes function. "In previous studies, reviewers or even investigators would think, 'Oh, that has to be wrong' and never even publish it," she says. These surprising findings, she adds, are becoming much more accepted thanks to association studies.
Range of approaches
Not long after the first genome-wide association study was published, scientists were already tinkering with the basic concept of scanning all SNPs (or at least a very large group of SNPs) across hundreds or thousands of people. Last year, the largest known GWAS was published by the Wellcome Trust Case Control Consortium — a collection of some 50 groups that pooled resources to study seven diseases across 14,000 people and a shared group of 3,000 controls. The new concept of looking at several diseases at the same time led to a number of interesting findings. For instance, even different auto-immune diseases like Crohn's and diabetes share risk variants, says Peter Donnelly, who chaired the consortium. Next for the consortium: scientists will be following up the initial study with another project that will incorporate copy number variation for diseases spanning bipolar disorder to hypertension.
John Sninsky, vice president of research discovery at Celera, says his group collaborated with scientists at Leiden University Medical Center in the Netherlands in an effort that took a slightly different approach by limiting the SNPs being interrogated. The project, which was launched to find novel gene variants for susceptibility to deep vein thrombosis, looked at a set of 20,000 SNPs restricted to those in coding regions known to change an amino acid, Sninsky says. "We felt they were more likely to be functional," he adds.
Going forward, he says he would look at subphenotypes and covariates sooner in the association process. His team found in this study that gene variants with relatively small effect predicted much higher risk when paired with a lifestyle factor, such as patients who were on oral contraceptives, for example. "It may be that this phenotype information will allow us to identify those disease subsets that those [variants] are most applicable to," Sninsky says.
At Genizon, CEO John Hooper says his team's studies of the Quebec founder population have been more powerful because of the stratification approach in place. For instance, he says, a focused look at patients with paranoid schizophrenia produced more genes of interest than a look at patients with various forms of schizophrenia. Performing GWAS in more targeted populations, he believes, is a promising way to find genes more tightly linked to disease.
Meanwhile, at Merck, Eric Schadt's group just wrapped up a large-scale study of obesity that went beyond SNP variation to include gene expression data as well. That combination of data allowed his team to find not just variants but networks of genes responsible for obesity, he says. A subnetwork his team particularly homed in on comprised 1,200 genes. "The entire network was causing disease — hundreds of tightly interacting genes," Schadt says.
That kind of "SNPs-plus" approach to association studies is generating interest across the field. At MAQC, where a genome-wide association working group is looking at various data sets to help determine best practices for quality assurance, data analysis, and so forth, the view is that combining data types will yield high-quality returns that will be more applicable to human health, says Leming Shi at FDA, who heads up the project. "We want to combine genotyping and gene expression, and hopefully that will offer additional benefits in terms of more accurately predicting patient outcome."
Still a challenge
Despite the hoopla, there are a number of hurdles these association studies will have to clear before their results can make a difference in patient care. For one, results of a GWAS approach don't necessarily provide a functional variant. What scientists find, says Donnelly, is "here's a smallish region of the genome where there's something interesting happening. We can't normally tell [the causative variant] just from the genome-wide association study."
Another inherent limitation of these kinds of studies, says Teri Manolio at NHGRI, is that "false positives are a real problem." She also believes that "we may be underestimating the false negative problem. … A little bit of error is OK, but when you do it a million times you get a lot of errors," she says.
A related difficulty is in getting truly meaningful susceptibility data from the studies. "When you [interrogate] the genome, it's like rolling dice," says Celera's Sninsky. "The more dice you throw, the more likely you are to get a seven. [A variant] may be a coincidental association which doesn't hold up under further investigation." His team is beginning to combine SNPs that increase the risk factors, which he says ultimately may help "potentially stratify patients by pathway" rather than by a gene variant or two. In order to make these studies more predictive, there will have to be more attention paid to the particular populations under review, says Hooper at Genizon. While the conventional wisdom for finding high-quality variants is to expand the sample size, Hooper says that "doubling the number of patients you have doesn't double the number of things you find." He believes many scientists are "throwing money at these very large studies" and getting less out of them than they might with smaller studies and more effective analysis or project design. He says Genizon researchers have "gone deeper into the data using much more sophisticated methods [that] allow you to reduce the number of subjects you look at but find more genes."
FDA's Shi says scientists will need to "build more predictive models" of association data before the potential for clinical impact can be reached. "We will need more well-characterized data sets for which the endpoints for the phenotype are well characterized," Shi says. "We really need to be more accurate in prediction."