NEW YORK (GenomeWeb News) – A group of bioinformatics and technology-focused firms and institutes that challenges informatics researchers to solve specific problems, has issued a new challenge to spur investigators to develop better methods for identifying cancer-associated mutations.
The partners that launched the new challenge include Sage Bionetworks; the Ontario Institute for Cancer Research; the University of California, Santa Cruz; Annai Systems; IBM's DREAM (Dialogue for Reverse Engineering Assessments and Methods) project; and others, the partners said on Thursday.
Specifically, the goal of the new ICGC-TCGA-DREAM Somatic Mutation Calling (SMC) Challenge is to generate accurate methods for identifying cancer-associated mutations in whole-genome sequencing data.
The SMC partners plan to make available to challenge participants 9 terabytes of raw human sequence data derived from normal and tumor tissues. These raw sequencing data will be derived from 10 pairs of matched tumor and normal tissue samples from five pancreatic cancer and five prostate cancer patients.
The SMC challenge includes two sub-challenges. Participants will be asked to build a model that accurately predicts cancer mutations that alter a single nucleotide in the genome, and a model that accurately predicts cancer mutations that alter the order of a large stretch of the genome, or impacts structural variation.
There is a need for improvements in these types of models because algorithms that can identify these single-nucleotide or structural variants can be very useful in guiding personalized cancer treatments, the partners said.
The contestants will have six months to develop and optimize their predictive models. In July 2014 the organizers will use an independent sequencing platform to validate at least 5,000 candidate mutations generated by the challengers. These predictive models will be ranked based on sensitivity, specificity, and balanced accuracy, among other metrics.
The participants will use Sage Bionetworks' Synapse infrastructure to collaborate on an open platform, record processing and analyses, submit predictive models to a real-time leaderboard, and share ideas and modeling information.
Annai Systems is providing its Annai-GNOS data management platform to facilitate data uploading, hosting, and access. Google is making its Google Cloud Platform available to OICR-approved participants, which will help enable teams who don't have access to large computer clusters at their institutions. Hitachi also has provided free storage to host the data on a 1 petabyte disk.
The Nature Publishing Group will consider publishing the work that performs the best in this challenge.