NEW YORK (GenomeWeb News) – Genetic variants influencing disease risk are more likely to occur in differentially expressed genes, according to a team of researchers who have developed a tool, dubbed functionally interpolating SNPs, or fitSNPs, for prioritizing candidate loci discovered during genome-wide association studies.
By examining nearly 500 gene expression datasets and integrating results from several genome-wide association studies, a research group led by Stanford University medical informatics and pediatrics researcher Atul Butte found that differentially expressed genes were about 12 times more likely to contain disease-related variants than other genes. After testing their approach to rediscover known type 1 and 2 diabetes-related variants, the team used the method to come up with a list of candidate genes for other genetic conditions. The research appears online today in Genome Biology.
“[F]itSNPs successfully distinguished true disease genes from false positives in genome-wide association studies looking at multiple diseases,” Butte said in a statement, “and can serve as a powerful and convenient tool to prioritize disease genes from this type of study.”
The team attempted to integrate gene expression data in an effort to come up with a new method for determining which GWAS candidate loci should be targeted in follow-up validation studies. They noted that sequence information, protein-protein interaction networks, published literature, gene ontology, and even gene expression are already used to classify candidate genes and SNPs. But, they argued, new and better methods are necessary for prioritizing such SNPs.
“[W]e hypothesize that a more general (and therefore more systematic) link exists between a gene’s expression and the likelihood that it is associated with disease,” the authors wrote. “[W]e propose an integrative genomics method to systematically prioritize DNA markers that aim to accelerate the identification of novel causative genes and variants.”
Butte and his team downloaded all of the microarray expression data available in the NCBI’s Gene Expression Omnibus — representing 476 curated human gene expression datasets — and did nearly 4,900 group versus group comparisons before coming up with a list of 19,879 genes that were differentially expressed in one or more of the experiments.
When they compared the nearly 20,000 differentially expressed genes with the 3,221 disease-associated variants they found on the Genetic Association Database and the Human Gene Mutation Database, the team found that 99 percent of the disease-associated genes were also differentially expressed in at least one GEO database. Conversely, differentially expressed genes were reportedly 12 times more likely to contain disease-related variants.
When they focused on genes whose expression was measured in at least five percent of the GEO datasets, the team found that they calculate so-called differential expression ratios that provided information about disease-risk associations for variants in those genes. Specifically, genes that were differentially expressed more often were also more likely to contain validated disease-associated variants.
Reasoning that this information could help to prioritize candidate SNPs from GWASs, the researchers set out to test their theory on data from type 1 and type 2 genome-wide association studies.
They demonstrated that the top seven loci identified in the Wellcome Trust Case Consortium’s type 1 diabetes mellitus study had significantly higher differential expression ratios than genes that weren’t linked to the disease. And their differential expression ratio could predict positive genes with 89 percent specificity and 75 percent sensitivity.
Similarly, the team examined the top 15 type 2 diabetes mellitus genes identified from six large-scale GWAS and several smaller association studies. They found that the disease-associated genes had higher differential expression ratios. Again, differential expression ratios could predict positive genes with 85 percent specificity and 60 percent sensitivity.
Then the team went a step further: attempting to predict disease-related genes based on differential expression ratios using the fitSNPs tool, which lists human genes based on the differential expression ratios of associated genes. By loading the information onto the University of California at Santa Cruz’s genome graph and visualizing data along the genome, the team reported that they were able to obtain a wealth of data about the genome that could aid GWASs.
“We called the tool ‘functionally interpolating SNPs’ because it not only infers the likelihood of disease association for all human SNPs but also suggests potential diseases to guide functional studies,” the authors explained.
For instance, when they used fitSNPs to try to predict type 1 diabetes-related loci, the researchers turned up all seven of the top loci from the Wellcome Trust Case Control Consortium as well as a new variant at the 4q27 locus in a predicted gene called KIAA1109.
Finally, the researchers used the fitSNPs approach to compile a list of 2,586 candidate genes that they believe could be worth investigating for potential roles in hundreds of diseases or syndromes for which no molecular basis has discovered. And, the authors noted, that could complement GWASs and lower cost by allowing researchers to hone in on genes that are more likely to be disease-related.