NEW YORK (GenomeWeb) – Four Canadian government research funding agencies will spend around C$7.3 million (US$6.7 million) to create a cloud computing facility and data mining tools that will enable researchers to access and use data from the International Cancer Genome Consortium.
Canada's government said today that the investment will create the Cancer Genome Collaboratory, a resource that will process ICGC's genomic profile data of 25,000 patients from around the world. Cancer researchers will use the resource to conduct complex data mining and analyses using 10 to 15 petabytes of cancer genome sequence data that will be paired with associated clinical information.
The Canadian funding for the initiative will include C$3.1 million from the Natural Sciences and Engineering Research Council, C$2 million from Genome Canada, C$1.3 million from the Canada Foundation for Innovation, and C$900,000 from the Canadian Institutes of Health Research. The University of Chicago will provide C$500,000 in in-kind funding, including computing resources, and will donate a large amount of data from the ICGC.
The Toronto-based ICGC is the largest worldwide effort to produce a catalog of the genetic structure of cancers, and it aims to characterize tumors in 500 patients for each of the major types of cancer.
The Collaboratory will enable scientists to use metadata tagging, provenance tracking, and workflow management software to "execute complex analytic pipelines, create reproducible traces of each computational step, and share methods and results" with other investigators. They will be able to develop questions about cancer risk, tumor growth, and drug treatments, and then extract relevant data and analyses from the cloud resource.
The Canadian government said in a statement that the project proposes "a fundamental reversal in the current practice of genome analysis." Instead of spending weeks downloading hundreds of terabytes from a central repository before they can begin their studies, researchers will be able to upload their analytical software into the Collaboratory, run it, and then securely download their results.
"Canada and many other nations around the world have already invested tremendous resources in the sequencing of thousands of cancer genomes, but until now there has been no viable long-term plan for storing the raw sequencing data in a form that can be easily accessed by the research community," Lincoln Stein, director of the Informatics and Bio-computing Program at the Ontario Institute for Cancer Research, said in a statement.
"The Cancer Genome Collaboratory will open this incredibly important data set to researchers from laboratories large and small, enabling them to achieve new insights into the causes of cancer and to develop innovative new ways to diagnose and manage the disease," Stein said.
Because it will contain donors' personal genetic data, the project will place an emphasis on privacy, and a group of computer scientists will work to develop new ways to protect individual privacy, such as methods for making genetic profiles anonymous while retaining important details, and techniques for structuring research queries so they can be processed via secure data storage sites.
The initiative also will support research that will benefit computer science and other fields beyond cancer genetics, including the development of application programming interfaces for accessing large data sets, and research into indexing, searching, compression, and cryptography.
Canada said the new data mining tools that spring from this initiative should be available in 2015 for beta testing by select researchers, and the Collaboratory facility will be open to the research community in 2016.
The IGCG has already collected, analyzed, and released data on over 8,500 donors, and when it is completed in 2018 it will hold data from more than 50,000 individual genomes.