A team of investigators has demonstrated that using a combination of graphical techniques to represent genomic data can help biologists and clinicians better understand biomarkers as compared to current methods such as principal components analysis or k-means clustering.
According to the team, the study, which is described in an article in press with the Journal of the American Medical Informatics Association, is a proof of concept for the "application of multiple visual analytic representations to comprehend the relationship between subjects and SNPs."
The method uses three different bipartite visual representations — bipartite network, heat map with dendrograms, and Circos ideogram — to provide three different but complementary views of SNP data.
The approach overcomes limitations associated with current analysis methods like PCA or clustering, which reduce the dimensionality of the data to generate "unipartite" representations — single data views of either individuals or SNPs — that may conceal complex but important information such as "how subject and SNP clusters relate to each other, and the genotypes that determine their cluster memberships," the paper states.
The bipartite approach, the authors explain, makes it possible to present two different classes of data at the same time. They determined, however, that using only bipartite networks would not be "adequate" for their analysis, and therefore added two other bipartite visualization methods, heat maps and Circos plots, which are "well known in the bioinformatics community, but not often used in combination."
The combined approach provides researchers with tools for "simultaneous visualization and analysis of subjects, SNPs, and subject attributes" as well as the ability to see "the type and frequency of genotype associations between subjects and SNPs," the authors wrote.
Using three different bipartite graphs to represent the same information ensures that researchers get a complete picture of the relationships within the data, Suresh Bhavnani, a UTMB associate professor of biomedical informatics and lead author on the paper, explained to BioInform.
"You not only see the SNPs, but you also see the individuals, and ... the big picture of how all of them come together," he said. "In addition to that ... when you build these visual models of your data, you [can] interact with them and conduct quantitative analysis, which often helps to reveal hidden patterns in the data."
Bhavnani, who heads UTMB's Discovery and Innovation through Visual Analytics, or DIVA, laboratory, added that the paper is one of the first applications of bipartite representations to SNP data.
The methods described in the paper are publicly available however, "we have shown why and how to use them in combination to analyze the relationship between SNPs and individuals," he said.
The paper won a distinguished paper award when it was presented at the Summit on Translational Bioinformatics held in San Francisco in March.
In the JAMIA study, the researchers discuss how the technique was used to analyze 78 SNPs from 120 individuals — 60 from Nigeria and 60 from Utah — who participated in the HapMap project.
"We selected SNPs that we already knew differentiated between the two groups, and then showed that our method can reveal more about the data than traditional methods," Bhavnani explained in a statement.
Using the data, Bhavnani et al. created a bipartite network visualization that showed the individuals in the study and their genetic profiles simultaneously.
This particular visualization showed distinct clusters that correspond to the Utah and Nigerian subjects and SNPs.
This allowed the team to "look at the individuals and know immediately which SNPs make them different from others," Bhavnani said.
The graph also reveals SNPs that show up in both populations — admixed individuals — and lets researchers observe how these variations are "co-occurring, and with which individuals they are co-occurring," he said.
The SNPs identified in these admixed individuals could help in the design of case-control studies where the selection of homogenous sets of individuals from different ancestral origins is needed, the researchers explained.
"The network representation is very powerful because it gives you the overall structure of the data," Bhavnani said. "But to really understand the complex relationships, you need these additional bipartite representations."
In order to get these more detailed views of the data, the researchers applied two other visualization techniques: the bipartite heat map and the bipartite Circos ideogram.
The heat map helps define the boundaries of the clusters generated in the first step. According to the paper, while heat maps enable the "inspection of subjects and their relationship to each SNP," they cannot "simultaneously represent attributes of the [individuals, for example] sex." Nor do they allow "interactive exploration of the relationship between subsets of the data," such as in the case of admixed populations.
That is the responsibility of the Circos ideogram, which allows a user to more closely examine admixed individuals, for instance, and also to include attributes like sex and other demographics.
According to Bhavnani, this three-pronged approach could be used in disease research efforts or in studies that explore ancestral origins such as the JAMIA paper does.
The authors have already begun to use the technique to analyze SNPs associated with Alzheimer's disease, he told BioInform.
The team is also analyzing datasets generated by other research groups. Specifically, they are looking for SNPs that might be associated with pediatric otitis — ear infections — and preterm deliveries, he said.