Genome-wide association studies narrow the search for disease genes to part of a chromosome — but then more work has to be done to identify the causative gene. "When you look at how people find disease SNPs now, it's usually a pretty tedious process," says Stanford University's Atul Butte. By mashing together publicly available microarray data, Butte and his lab have found a way to speed up that search up by prioritizing SNPs for further study.
When Butte and his colleague Rong Chen looked at microarray data for a disease from a repository, they saw that the same genes kept cropping up. During his doctoral work, Butte had noted that differentially expressed genes are positively associated with disease. In their current work, they exploited that connection to find disease genes. "Our suggestion now is to go after those genes that are repeatedly showing up in these microarray experiments and go after those first, instead of picking something randomly," Butte says.
To determine which loci are these functionally interpolating SNPs, which they dubbed 'fitSNPs,' Chen devised a differential expression ratio. Simply put, it is the proportion of the number of experiments they looked at to the number of experiments in which the SNP is differentially expressed. "This differential expression ratio is positively associated with the likelihood of having a mutation with a significant association with disease," Chen adds.
They then put their fitSNP method to the test using data from the Wellcome Trust Case Control Consortium. "What we could do then is go back and say, 'Well, if you had fitSNPs before you had actually went to sequence those loci, how successful would you be?'" Butte says. As they report in Genome Biology, fitSNPs could distinguish type 1 diabetes genes with 89 percent specificity and 75 percent sensitivity and suggested a putative diabetes gene, KIAA1109, which the Wellcome Trust Nature paper did not link to a gene.
In addition to a webpage of fitSNPs, Butte and Chen created a tool called GeneChaser, which they report on in BMC Bioinformatics. With it, researchers who are not steeped in bioinformatics can browse through gene expression data from public repositories and see which microarray experiments found a particular gene to be differentially expressed. "We see a lot of magic in putting a lot of microarray data hits together," Butte says.