Skip to main content
Premium Trial:

Request an Annual Quote

Kaggle's Winning HIV Progression Prediction Model Outperforms Established Scientific Methods


Kaggle — an online platform for hosting bioinformatics competitions — has named the winner for its first contest, an effort to develop a tool that would identify markers in the sequence of the human immunodeficiency virus genome that could predict a change in the severity of HIV infection.

Chris Raimondi, a search engine optimization specialist who won the Predict HIV Progression Competition, used the R-based Caret and randomForest software packages to accurately predict changes in viral loads with more than 77 percent accuracy, compared to 70 percent for the best methods in the scientific literature.

To develop their prediction models, the 109 teams who participated in the competition downloaded data on the nucleotide sequences of patients' reverse transcriptase, protease, and viral load and CD4 count at the beginning of therapy. Each team was required to submit predictions for 692 patients.

According to the organizers, the competition was set up to find the markers in HIV sequences that predict changes in the viral load, indicating the severity of the disease. They expect that the models will provide a better understanding of the “genetic blueprint” of HIV that can be used to help develop more effective therapies for the infection.

“This result neatly illustrates the strength of data modeling competitions for scientific research. Whereas the scientific literature tends to evolve slowly…a competition inspires rapid innovation by introducing the problem to a wide audience,” Kaggle CEO Anthony Goldbloom said in an e-mail to BioInform

As the winner, Raimondi received $500 and will have an opportunity to co-author a paper with the host of the competition.

Kaggle's organizers aim to provide an opportunity for bioinformaticians to develop new data-analysis tools and techniques, and for researchers and organizations to expose their data to a wide range of analytical techniques.

A detailed description of the winning entry is available here.

The Scan

Mosquitos Genetically Modified to Prevent Malaria Spread

A gene drive approach could be used to render mosquitos unable to spread malaria, researchers report in Science Advances.

Gut Microbiomes Allow Bears to Grow to Similar Sizes Despite Differing Diets

Researchers in Scientific Reports find that the makeup of brown bears' gut microbiomes allows them to reach similar sizes even when feasting on different foods.

Finding Safe Harbor in the Human Genome

In Genome Biology, researchers present a new approach to identify genomic safe harbors where transgenes can be expressed without affecting host cell function.

New Data Point to Nuanced Relationship Between Major Depression, Bipolar Disorder

Lund University researchers in JAMA Psychiatry uncover overlapping genetic liabilities for major depression and bipolar disorder.