CHICAGO (GenomeWeb) – A new computational method for predicting phenotypes from remotely located single-nucleotide polymorphisms has been shown to be more accurate than prediction based on connected SNPs on SNP-SNP networks, according to research presented here last week.
In an experiment with Arabidopsis thaliana samples, SPADIS, which stands for Selecting Predictive and Diverse SNPs in genome-wide association studies, outperformed a previous method of prediction in 15 of 17 phenotypes examined. It also ran faster and identified more candidate genes than the earlier algorithm, called Selecting CONnected Explanatory SNPs, or SConES.
Serhan Yilmaz, a PhD student at Case Western Reserve University, developed SPADIS while studying for a master's degree in computer engineering at Bilkent University in Ankara, Turkey. He presented his findings at the 2018 Intelligent Systems for Molecular Biology (ISMB) conference of the International Society for Computational Biology a week ago.
Yilmaz also shared his work in a preprint article on BioRxiv and in a poster at ISMB. The code for SPADIS has been made available on Github.
"SConES is not generally optimized for phenotype predictions, so we wanted to make a method that is optimized for phenotype predictions while using an approach such as this," namely GWAS, Yilmaz told GenomeWeb. "SPADIS improves on phenotype prediction over SConES," which was described in a 2013 paper in Bioinformatics and looks for connected genetic loci that can be associated with phenotypes. Yilmaz called this method "state of the art," but noted that it has shortcomings.
"We argue that enforcing the selected features to be in close proximity encourages the algorithm to pick features that are in linkage disequilibrium or that have similar functional consequences," Yilmaz, his former academic advisor at Bilkent University, and a collaborator at Sabanci University in Istanbul wrote in the poster and the preprint article.
"One extreme choice of this approach would be to choose all SNPs that fall into the same gene if they are individually found to be significantly associated with the phenotype. When there is an upper limit on the number of SNPs to be selected, this leads to selecting functionally redundant SNPs and misses variants that cover different processes," they continued.
In contrast, SPADIS looks for SNP-SNP associations that are far apart on loci interaction networks.
For the Arabidopsis thaliana trial, Yilmaz compared the performance of SPADIS and SConES for gene sequence, gene membership, and gene interaction networks. SPADIS outperformed SConES more often than the opposite.
"We empirically show that SPADIS can recover SNPs known to be associated with the phenotype and the optimization is efficient," according to the presentation.
The Turkish group also went beyond the German team that developed SConES by looking at whether Hi-C data could be useful in selecting SNP sets. They built a SNP-SNP network based on 3D genomic contacts captured by the Hi-C method.
"Our results show that Hi-C data consistently provides slight improvements in regression performance. We think it is a promising source of information for SNP association," they wrote.
While the initial experiment was only with one species of plant, Yilmaz believes that the SPADIS method could be promising for discovering phenotypic associations in complex human diseases, including autism spectrum disorder.
The preprint article said that the shortcomings in SConES "would be even more problematic since multiple functionalities (thus gene modules in the network) are required to be disrupted for an ASD diagnosis, whereas damage in only one leads to a more restricted phenotype."
Yilmaz confirmed that future research will indeed involve genomes of humans on the autism spectrum.