NEW YORK (GenomeWeb) – A team led by researchers at the University of Glasgow has developed a machine-learning method for predicting natural host reservoirs or insect vectors for RNA viruses based on viral genome sequences.
"Being able to use those genomes to predict the natural ecology of viruses means we can rapidly narrow the search for their animal reservoirs and vectors, which ultimately means earlier interventions that might prevent viruses from emerging altogether or stop their early spread," said senior author Daniel Streicker, a virus researcher at the MRC-University of Glasgow, in a statement.
As they reported online today in Science, Streicker and his colleagues started with genome sequences and features for hundreds of RNA viruses falling into well-established animal reservoir or arthropod vector groups. In that set, their phylogenetic neighborhood-based approach successfully predicted animal reservoirs for the viruses roughly 58 percent of the time and could classify around 95 percent of the viruses into groups with or without insect vectors.
From there, the team incorporated information on more than 4,200 additional traits — including potentially host-related viral codon pairs and dinucleotide biases — that they gleaned from 536 viral genomes. Using an integrated supervised machine-learning model based on gradient-boosting machines, this allowed them to bring animal reservoir prediction accuracy up to nearly 84 percent. Still other models and combined models brought in features that marked viruses with arthropod-based transmission.
After a series of training steps, the researchers went on to predict potential ecological niches for a collection of 69 so-called "orphan" viruses with murky or unexplored animal reservoir or insect vectors — from Zika virus or MERS coronavirus to Crimean Congo hemorrhagic fever virus.
The algorithm implicated hooved animals in a form of human enteric coronavirus that appears capable of infecting humans from cows, for example, and pointed to a primate reservoir for an O'nyong-nyong virus carried by humans. The researchers also identified two ebolaviruses with predicted primate reservoirs, along with ebolaviruses that had features associated with more well-known bat reservoirs for the virus.
"As viral genomes are now produced within hours of detection, algorithms that rapidly generate field-testable hypotheses from sequence data narrow the gap between virus discovery and actionable understanding of virus ecology," the authors wrote.
These and other results suggest that such machine-learning methods may ultimately complement more traditional strategies for narrowing in on the natural sources for viruses with significance to human health or economics.
"Current practice requires combining evidence from field surveillance, phylogenetics, laboratory experiments, and real-world interventions, but is time consuming and often inconclusive," the authors explained. "This creates prolonged periods of uncertainty that may amplify economic and health losses."