In order to advance the design of vaccines and molecular detection systems, techniques for discerning whether a particular HIV-1 sequence stems from an unknown HIV-1 subtype are needed. Surprisingly, there is only one algorithm, called the Branching Index, that has been developed for this purpose. Although the BI algorithm can tell researchers how closely a given query sequence clusters with a subtype, it provides no data on boundaries of unknown fragments. To address this limitation, a group of German and French researchers developed a new algorithm called the Unknown Subtype Finder, or USF, that automatically determines which segments of an input sequence originate from an unknown subtype.
According to Ingo Bulla, a researcher at Institute of Microbiology and Genetics at the University of Göttingen in Germany, the most challenging part of developing USF was the design of its underlying probabilistic model. "For USF, we developed a probabilistic model of the position-wise differences between subtypes and deduced an algorithm for the detection of fragments of unknown subtypes from this model," Bulla says. "In contrast, the BI is a heuristic ad hoc approach, probably inspired by bootscanning."
In a paper published in BMC Bioinformatics this April, the researchers describe how they applied USF to SIV and HIV-1 sequences that were formerly classified as having been derived from an unknown subtype. They also evaluated the USF algorithm's performance on artificial HIV-1 recombinants and non-recombinant HIV-1 sequences. The results demonstrated that USF is effective for identifying segments in HIV-1 sequences stemming from yet unknown subtypes, and performs as well as, or outperforms, the BI algorithm.
Bulla and his collaborators plan to further develop USF so that the tool is capable not only of assigning the known subtypes of HIV-1 — or subfamilies of other viruses or species to a query sequence — but also of detecting segments of the genome stemming from an unknown subtype. "The source code of USF is available to the public and we plan to incorporate USF into the jumping profile Hidden Markov Model, a tool for subtype classification and breakpoint detection in HIV-1 and HCV [the hepatitis C virus]," Bulla says. "Since jpHMM is already an established tool, we would make USF available to the public in a form usable for virologists as an extension of jpHMM through the current Web interface of jpHMM."