Skip to main content

Kaggle's Winning HIV Progression Prediction Model Outperforms Established Scientific Methods


Kaggle — an online platform for hosting bioinformatics competitions — has named the winner for its first contest, an effort to develop a tool that would identify markers in the sequence of the human immunodeficiency virus genome that could predict a change in the severity of HIV infection.

Chris Raimondi, a search engine optimization specialist who won the Predict HIV Progression Competition, used the R-based Caret and randomForest software packages to accurately predict changes in viral loads with more than 77 percent accuracy, compared to 70 percent for the best methods in the scientific literature.

To develop their prediction models, the 109 teams who participated in the competition downloaded data on the nucleotide sequences of patients' reverse transcriptase, protease, and viral load and CD4 count at the beginning of therapy. Each team was required to submit predictions for 692 patients.

According to the organizers, the competition was set up to find the markers in HIV sequences that predict changes in the viral load, indicating the severity of the disease. They expect that the models will provide a better understanding of the “genetic blueprint” of HIV that can be used to help develop more effective therapies for the infection.

“This result neatly illustrates the strength of data modeling competitions for scientific research. Whereas the scientific literature tends to evolve slowly…a competition inspires rapid innovation by introducing the problem to a wide audience,” Kaggle CEO Anthony Goldbloom said in an e-mail to BioInform

As the winner, Raimondi received $500 and will have an opportunity to co-author a paper with the host of the competition.

Kaggle's organizers aim to provide an opportunity for bioinformaticians to develop new data-analysis tools and techniques, and for researchers and organizations to expose their data to a wide range of analytical techniques.

A detailed description of the winning entry is available here.

The Scan

Call to Look Again

More than a dozen researchers penned a letter in Science saying a previous investigation into the origin of SARS-CoV-2 did not give theories equal consideration.

Not Always Trusted

In a new poll, slightly more than half of US adults have a great deal or quite a lot of trust in the Centers for Disease Control and Prevention, the Hill reports.

Identified Decades Later

A genetic genealogy approach has identified "Christy Crystal Creek," the New York Times reports.

Science Papers Report on Splicing Enhancer, Point of Care Test for Sexual Transmitted Disease

In Science this week: a novel RNA structural element that acts as a splicing enhancer, and more.