A team of scientists from the Genome Institute of Singapore and the University of Maryland have devised a computational method to pinpoint genetic shifts in the composition of the influenza virus that could potentially give rise to more deadly strains of the bug.
The method, published in the December issue of Nucleic Acids Research, is the result of a three-year collaboration between Niranjan Nagarajan, a research scientist in the computational and mathematical biology arm of GIS, and Carl Kingsford, an assistant professor of computer science at UMD.
Graph-incompatibility-based Reassortment Finder, or GiRaF, identifies reassortments, which occur when genetic segments from two influenza strains merge to form a hybrid.
In the paper, the researchers wrote that these events "can quickly create a strain to which there is little or no immunity in the human population." In fact they note that these hybrids have been linked to at least three major flu pandemics including the 2009 H1N1 outbreak.
GiRaF "compares distributions of trees by constructing an 'incompatibility graph' and mining it for phylogenetic discordances using a search algorithm," Nagarajan and Kingsford wrote. The method also "employs a phylogenetic distance test to improve on the false positive rate and combines answers from all segments of the genome to produce a comprehensive catalog of reassortments."
They note that current approaches for identifying reassortments involve reconstructing phylogenetic trees for the eight segments that comprise the influenza genome, and then comparing them manually — a time-consuming process.
"The basic idea is that if you have one strain that’s reassorted and [includes] two segments A and B that come from [strains of] different lineages ... you [can] construct phylogenetic trees [of the segments] and check the evolutionary histories," Nagarajan explained to BioInform. Differences in the location of the viral sequence in each segment's tree indicate that it is likely a hybrid of the two strains, he said.
To further complicate matters, viral sequences have "high mutation rates and tangled evolutionary histories, making the task of phylogenetic reconstruction particularly hard," the researchers wrote. As a result, hybrids are usually identified by focusing on "high confidence branches," Nagarajan said, thus losing large quantities of genetic information.
While other recombination-detection methods, such as Recombination Detection Program, or RDP, and SplitsTree4, have attempted to address these challenges, the authors note in the paper that they are plagued by high false-positive rates.
GiRaF solves these problems by scouring the constructed phylogenetic trees for "groups of incompatible splits" using a biclique enumeration algorithm and statistical tests to "identify sets of taxa with differential phylogenetic placement," the paper states.
"There are no approximations or heuristics involved here," Nagarajan said. "[The] algorithm [is] guaranteed to find all high-confidence disagreements between two distributions of trees."
To prove their point, the researchers tested the model on several sets of human and avian influenza genomic sequences gathered from National Center for Biotechnology Information's Influenza Virus Sequence database and synthetic sequences generated using a program called Seq-Gen.
After generating a set of consensus trees from the sequence data using MrBayes, the team constructed an incompatibility graph, which contained nodes representing the splits observed in any sampled tree and edges linking the splits contained in separate trees.
Next, GiRaF identified subsets of the nodes, or "high-confidence bicliques," that were likely reassortment candidates and then used a phylogenetic distance test to further narrow the list by eliminating false positives before the combination step.
In one test involving the human flu virus, GiRaF was able to predict four reassortments with high confidence out of 181 viral genomes. Using the synthetic datasets, Nagarajan and Kingsford report that GiRaF identified 8 out of 10 reassortments.
The researchers reported that GiRaF was able to process viral datasets in a matter of minutes on a single processor, although they note that the tree-construction process using MrBayes required several hours.
Although the tool can't predict what happens when two viruses form a hybrid strain, GiRaF could be a useful surveillance tool that can identify reassortments in new viral sequences, Nagarajan said.
He added that the team plans to use the software to examine whether there are any biases involved in the reassortment process or if it is simply random, as well as for exploring horizontal transfer events in bacteria.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.