The US Army's Biotechnology High Performance Computing Software Applications Institute has released a free computational pipeline for designing PCR-based pathogen-detection assays that it claims offers a number of advantages over currently available tools.
The Tool for PCR Signature Identification, or TOPSI, uses pairwise alignment to identify sequences that are common to multiple pathogenic target genomes. It then compares them to a database of non-target genomes in order to identify unique segments that can serve as PCR signatures.
According to a paper describing the method that was published last week in BMC Bioinformatics, TOPSI is the only available software system for pathogen signature design that is "freely available, high-throughput, and fully integrated."
Jaques Reifman, director of the BHSAI and a co-author on the paper, told PCR Insider that there are currently two main software packages available for PCR signature design: KPATH, developed by researchers at Lawrence Livermore National Laboratory, and Insignia, developed by the Center for Bioinformatics and Computational Biology at the University of Maryland.
However, both of these have drawbacks. Reifman noted that KPATH is not available to researchers outside LLNL, though the lab will run the software on sequences provided by external groups.
In addition, it uses whole-genome multiple alignment to find common regions among target sequences, which is extremely computationally intensive, especially for more than 20 or so sequences.
Insignia, meantime, takes the same sort of pairwise alignment approach as TOPSI to identify unique genomic regions that could serve as signatures, but doesn't perform downstream steps such as probe/primer design or specificity analysis of the signatures. As a result, there is "a lot of manual manipulation" required to design signatures with the Insignia server, Reifman said.
In addition, Insignia relies on its own database of precomputed matches between sequences, which enables the software to run very quickly, but limits its flexibility because users can't design signatures for pathogens that are not in the database.
In response, Reifman and colleagues developed the TOPSI pipeline with the goal of enabling a "push-button" tool for PCR signature design. "The user provides the sequence or sequences that he or she wants to find signatures for and the output is the fingerprints," he said.
In the BMC Bioinformatics paper, Reifman and co-authors evaluated TOPSI against signatures obtained from KPATH and Insignia, and also against experimentally validated signatures.
For the comparison against KPATH, the LLNL group provided 1,236 signatures for 18 Staphylococcus aureus genomes, while TOPSI generated 2,430 signatures, which "indicates that the number of signatures reported by TOPSI is comparable to that obtained by KPATH."
While only 830 TOPSI signatures overlapped with KPATH signatures, the authors note that direct mapping "is not possible because very few signatures will be exactly the same in the two software systems" due to differences in primer/probe design criteria and in input processing.
In a comparison of TOPSI and KPATH signatures across the S. aureus Mu50 genome, "in the regions where TOPSI does not report any signature, KPATH also does not report any signature" — evidence that these regions are not suitable for designing unique PCR signatures, according to the paper.
Insignia, meantime, generated nearly 70,000 candidate regions for S. aureus signatures, which the BHSAI team estimated would take more than 10 days to analyze for specificity, based on a test run of 400 candidate regions. By comparison, the specificity analysis step with TOPSI took approximately 12 hours on a 98-core Linux cluster.
"These results suggest that although Insignia might be extremely useful and convenient for designing a few signatures from selected regions of the target genome, unlike TOPSI, it is not ideal for high-throughput, whole-genome signature design on bacterial genomes that might result in thousands of signature candidates," the authors wrote.
In a comparison against five experimentally verified signatures for Burkholderia mallei, three of four TOPSI signatures were within 300 base pairs of one of the experimentally verified signatures, which demonstrates that TOPSI was "successful in identifying the experimentally verified unique regions of B. mallei," according to the paper.
Reifman noted that it's important to get tools like TOPSI into the hands of researchers due to the problem of "signature erosion" — a result of the ever-increasing size of the non-target databases used to determine the specificity of a given signature.
This challenge is only expected to accelerate with the advent of next-generation sequencing technologies. As more and more genomes are sequenced and deposited in databases, it increases the likelihood that a PCR signature that appeared unique to a given pathogen at one point may no longer be unique when compared to new genomes.
"Let's say for Burkholderia mallei, which is pathogenic, if the fingerprint that you have is also the fingerprint for a non-pathogenic E. coli that you have in your system, it will lead to false positives, because someone could have just sequenced this non-pathogenic E. coli six months ago and put it in the database," Reifman said
The only way to guard against signature erosion "is to continually run your fingerprints for a specificity check against the database to determine if your fingerprint is still a fingerprint," he said.
One drawback of TOPSI is that it is not well suited to viral pathogens, Refiman said. Because viruses have small, highly variable genomes, it is difficult to identify conserved segments that would serve as useful signatures. In tests, TOPSI was able to design PCR signatures for Variola major genomes, but not for human adenovirus genomes — an indicator that the software might work on large DNA viruses, but not short RNA viruses.