Kaggle — an online platform for hosting bioinformatics competitions — has named the winner for its first contest, an effort to develop a tool that would identify markers in the sequence of the human immunodeficiency virus genome that could predict a change in the severity of HIV infection.
Chris Raimondi, a search engine optimization specialist who won the Predict HIV Progression Competition, used the R-based Caret and randomForest software packages to accurately predict changes in viral loads with more than 77 percent accuracy, compared to 70 percent for the best methods in the scientific literature.
To develop their prediction models, the 109 teams who participated in the competition downloaded data on the nucleotide sequences of patients' reverse transcriptase, protease, and viral load and CD4 count at the beginning of therapy. Each team was required to submit predictions for 692 patients.
According to the organizers, the competition was set up to find the markers in HIV sequences that predict changes in the viral load, indicating the severity of the disease. They expect that the models will provide a better understanding of the “genetic blueprint” of HIV that can be used to help develop more effective therapies for the infection.
“This result neatly illustrates the strength of data modeling competitions for scientific research. Whereas the scientific literature tends to evolve slowly…a competition inspires rapid innovation by introducing the problem to a wide audience,” Kaggle CEO Anthony Goldbloom said in an e-mail to BioInform
As the winner, Raimondi received $500 and will have an opportunity to co-author a paper with the host of the competition.
Kaggle's organizers aim to provide an opportunity for bioinformaticians to develop new data-analysis tools and techniques, and for researchers and organizations to expose their data to a wide range of analytical techniques.
A detailed description of the winning entry is available here.