This article has been updated from a version posted Oct 22 to reflect an amendment to the title.
A UK-based non-profit has launched an experimental modeling system that accurately predicts how patients with HIV and AIDS will respond to antiretroviral drug combinations 80 percent of the time — a 20 percent improvement over the rules-based interpretation system commonly used by physicians, according to its developers.
The HIV Resistance Response Database Initiative, or RDI, launched the freely available HIV Treatment Response Prediction System earlier this month. HIV-TRePS, which is intended for research use only, bases its predictions on a range of more than 80 different variables including mutations in the viral genotype, drugs used to treat the patient in the past, CD4 cell counts, and viral load. The software uses four random forest models to predict how well patients will respond to various combinations of around 25 HIV drugs.
After inputting data into the system, a physician receives a report within 30 seconds that lists the drug combinations that the models predict are most likely to work based on the patient's medical history and the viral genotype.
In a statement, Brendan Larder, scientific chair of RDI, described the launch as "a milestone for us, our research partners around the world, and also for the use of bioinformatics in medicine."
The release of HIV-TRePS is the culmination of a project that began in 2002 when RDI was first formed. Staffed by four full-time employees and an advisory group of 10 HIV clinicians, RDI's mission is to be an “international repository” for HIV data with the goal of using the data to train predictive computer models, Andy Revell, executive director of RDI, told BioInform.
He explained that RDI developers needed an “enormous amount” of “heterogeneous" patient data to train their models and to that end have so far collected data for about 70,000 patients from clinics in multiple countries representing different treatment histories and different treatment protocols.
“The main challenge is collecting enough data on the newest drugs to keep the models up to date,” he said, adding that RDI continually collects new data as new treatments for HIV are developed.
For example, HIV-TRePS currently can't predict the efficacy of drugs like Pfizer's Maraviroc, which is still a relatively new antiretroviral, or Boehringer-Ingelheim's Tipranavir, which Revell said hasn’t been prescribed widely enough in clinical practice to yield useful data.
The Software Model that Could
RDI selected the random forest classifier for its models because it “seems to be giving us the best accuracy,” Revell said.
Previously, RDI researchers tried to train several models to determine the best approach, including artificial neural networks, support vector machines, and logistical regression models.
However, they ran into some issues with these approaches. For example, artificial neural network models had problems with overfitting, “which means that the model that you have trained is very well fitted for the training data … but it’s not as generalizeable to new data,” he said.
When the developers compared the performance of random forest-based models with neural network models, they found that the former were “slightly more accurate,” could accommodate new data, and were “quicker to train” than the neural network-based models.
Revell did note, however, that although HIV-TRePS uses RF models, the group continues to work with other models as well.
Training the models is "enormously computer intensive" Revell said, and typically requires two months for both training and testing on a compute farm. The number of computers needed varies depending on the type of model being trained and the amount of data, he said
"Sometime we train a single RF model, sometimes 10. Sometimes we have 5,000 treatment change episodes, sometimes 20,000 or more, to train the models," he said. "Also the number of variables can vary from 40 up to 90. All these have a bearing on the number of computers and the duration of training."
For the training process, the developers divided patient data into “treatment change episodes,” which Revell said contained details about viral loads, CD4 count, and virus genotype before the patients' antiretroviral treatment changed and then another set of similar data once a new treatment regimen began.
The team also set aside a portion of its initial training data to cross-validate the models during the training process.
“We feed thousands of treatment change episodes into the modeling software and the models basically learn … the subtle relationships between all those variables and the virological response of the patient to the new drugs, from exposure to a variety of treatment change episodes,” he explained.
As a final step in the training process, Revell's team tested the final selections with independent datasets so that the models were passed through “two measures of accuracy.” They then selected the most accurate models for the HIV-TRePS system.
To use the free system, physicians register for an account online and enter their patient's HIV genotype, treatment history, viral load, and CD4 count data. Once the data is entered and the parameters are set, the system crunches the data and spits out the results, which include predictions of the probability of specific drug regimens reducing the viral load to below 50 copies.
Users can opt to receive predictions for up to three different drug regimens or ask the system to model alternatives using the most commonly used regimens stored in the RDI database.
A third option would be to get predictions for both types of scenarios and then compare the results. The system also lets users rule out drugs based on a patient’s tolerance and access to antiretroviral drugs as well as rank results in the order of the “probability of the regimen working” to reduce the viral load.
Genotypes and Clinical Trials
The team compared its prediction method to rules-based genotype interpretation, which relies on a set of rules based on "in vitro and in vivo research that relates individual mutations to sensitivity or resistance to individual drugs," Revell said.
Rules-based genotype interpretation software indicates whether a patient’s virus is likely to be sensitive, intermediate, or resistant to each individual drug, based on the mutations detected in the viral genome; but wasn’t designed to take in additional information such as treatment history, CD4 counts, or viral load.
"We found that the genotype with rules based interpretation is about 50 to 60 percent accurate in predicting whether the patient will respond or not," Revell said. "When you use our system, it's generally about 80 percent accurate," representing a 15 to 20 percent improvement.
Revell conceded that both systems incorporate different information, which may account for the differences in accuracy. He noted, however, that "the beauty of our system is it takes all that information and makes a quantitative prediction of the probability of the whole combination of drugs taking the viral load to [an] undetectable [level], [while] the genotype rules-based interpretation gives you a prediction which is black or white or perhaps grey to individual drugs."
Revell’s group is now developing models within the system that can be used to make predictions without using genotype data, in order to meet the needs of countries where genotyping is expensive.
“We were concerned that the genotype is a critical piece of information in predicting whether the virus is going to be resistant or will respond to a regimen,” Revell said. “But as long as we have patient history … that seems to go a long way in helping the models make accurate predictions.”
He also said that the initial models his team has trained without the genotype have only been about “4 to 5 percent less accurate” than the models with the genotype.
RDI plans to collect additional data — such as from patients in sub-Saharan Africa, for example — to develop population-specific models that will be more accurate for patients in clinics in these areas. The researchers also plan to incorporate additional drugs into the system.
RDI has no plans to market the system, but hopes to secure funding to perform a controlled clinical trial comparing patients whose drug regimen is determined using current clinical methods such as genotype rules based interpretation with a second group whose treatments are selected using HIV-TRePS.
Revell noted that until the tool has been clinically validated, it is still an experimental system. "We make it very clear that you shouldn’t use it to select your drugs. It's down to the doctor using all the information available and this is just an experimental system you can look at."
Since the system was launched earlier this month, Revell said users in more than 30 countries have activated accounts, but he could not provide further details.