NEW YORK (GenomeWeb News) – The National Human Genome Research Institute has established a database for keeping tabs on SNP-trait associations coming out of published genome-wide association studies.
The Catalog of Published Genome-Wide Association Studies is a manually curated database that's periodically updated to reflect GWAS findings. A paper describing the catalog's development and an analysis of published GWAS results is scheduled to appear online this week in the Proceedings of the National Academy of Science.
"We have developed an online catalog of SNP-trait associations from published genome-wide association studies for use in investigating genomic characteristics of trait/disease-associated SNPs," senior author Teri Manolio, director of NHGRI's Office of Population Genomics, and her co-authors wrote.
"The new online resource, together with bioinformatics predictions of the underlying functionality at trait/disease-associated loci, is well-suited to guide future investigations of the role of common variants in complex disease etiology," they added.
Over the past few years, an explosion of GWA studies has implicated hundreds of SNPs in a wide range of traits and diseases. But these SNPs often have small overall effects on disease risk. And the meaning of such associations is often far from clear. In addition, GWAS detected trait or disease-associated SNPs, which the team dubbed TAS, may or may not actually be a causal variant.
Nevertheless, Manolio and her colleagues noted, such data provides an opportunity to learn more about the genetics of common diseases.
"The rapid increase in the number of GWAS provides an unprecedented opportunity to examine the potential impact of common genetic variants on complex diseases by systematically cataloguing and summarizing key characteristics of the observed associations and the trait/disease associated SNPs underlying them," they wrote.
To bring together TAS for the new database, the team first found GWAS results through PubMed, news and media reports, and an online genomic epidemiology database. They also developed an approach for finding blocks of TAS that are over-represented in different categories in order to account for linked variants.
The researchers compiled information on 531 SNP-trait associations after evaluating 151 of the 237 GWA studies published before the end of December 2008. These corresponded to 465 different TAS.
Consistent with the fact that most of these were identified using approaches that detect common variants, they noted that most of the SNPs in the catalog are present in far more than five percent of the populations evaluated.
By cross-referencing with the University of California at Santa Cruz Genome Browser, the team found that 43 percent of these TAS fall within intergenic regions in the genome. Meanwhile, 45 percent were intronic, nine percent non-synonymous, two percent synonymous, and two percent located in the 5' or 3' untranslated region of genes.
Even so, when they incorporated information on the odds ratios conferred by TAS blocks, the researchers found that TAS blocks in intergenic and intronic regions had the lowest odds ratios for disease or trait risk while those at non-synonymous sites and promoters had the highest.
There were exceptions, though. The team noted that there was a slight enrichment for TAS blocks in regions corresponding to regulatory elements in the Open Regulatory Annotation Database.
Several elements in that collection don't seem to be well conserved, leading the authors to argue the need for "more experimental investigation into the architecture of non-coding regulatory elements ... to decrease the reliance on conservation and guide more integrative computational prediction methodologies."
The authors noted that there are limitations to their approach — which incorporates results from published studies that report different levels of SNP associations. Still, they argued that continuing to do such analyses will likely offer new insights into how genetic variation influences disease risk.
"As the power of the GWAS approach increases with access to more samples, and as the types of methods to test for genetic associations expand to include copy number variants and rarer alleles, more associations will likely be identified and timely analyses similar to those presented here will continue to update our knowledge of the influence of genomic structure and function on complex diseases," the authors wrote.