An international research consortium plans to catalog genetic alterations in 50 different types of cancer in what amounts to be the largest human genome resequencing project launched to date.
The International Cancer Genome Consortium, which launched this week, was created to serve as an umbrella for existing and future cancer genome projects worldwide.
Over the next decade or so, the group will use new sequencing technologies and other molecular tools to comprehensively analyze approximately 25,000 cancer genome samples for a total estimated cost of $1 billion, and to make the data publicly available.
“It’s very much the next-generation sequencing technologies which are driving this consortium …, which we could not even imagine three years ago” because of the prohibitive cost, said Tom Hudson, president and scientific director of the Ontario Institute for Cancer Research in Toronto, which serves as the ICGC’s headquarters.
Current members hail from North America, China, Singapore, Japan, the European Union, Australia, and India. Members fund the projects they contribute to the consortium.
“It’s a very ambitious project,” ICGC member Mike Stratton, deputy director of the Wellcome Trust Sanger Institute and head of the institute’s Cancer Genome Project, told In Sequence this week “There will never have been a sample collection in cancer as big as this one will be in its aggregate.”
Projects like the Sanger’s Cancer Genome Project or the US National Institute of Health’s Cancer Genome Atlas will continue their work but will contribute their data to the ICGC and adhere to its mandates, according to the consortium.
“TCGA will continue to operate independently but participate through their data and through their participation in the executive committee of ICGC,” Brad Ozenberger, a program director at the National Human Genome Research Institute who is involved in the ICGC, told In Sequence.
The central aim of the consortium is to catalog genomic abnormalities, or somatic mutations, in 50 cancer types or subtypes, including SNPs, insertions, deletions, copy number changes, translocations, and other chromosomal rearrangements.
Its researchers will also generate gene-expression and DNA-methylation data for these samples, and will have the option to analyze other genomic components, such as the proteome or the metabolome.
To distinguish somatic from inherited sequence variations, the scientists will analyze tumors and matched non-cancerous tissue from the same patient.
The resulting data will need to meet quality standards set by the ICGC, and mutations should be determined at single-base resolution.
In order to discover genes that are mutated in 3 percent of cancers, researchers have to analyze approximately 500 specimens from every cancer type, at an estimated cost of about $20 million, or $40,000 per sample.
Though the ultimate goal of the ICGC is to sequence complete cancer genomes and their normal controls with high coverage, current high-throughput sequencing technologies do not yet allow researchers to do this at this price.
“We will be doing whole-genome sequencing at some point, when it becomes affordable,” Hudson said.
In the meantime, the ICGC suggests that participants sequence subsets of the genomes, such as all exons and splice sites, microRNAs, regulatory sequences, and conserved non-coding sequences.
“There are several technologies now available to achieve this goal, including enrichment by array pull down or PCR followed by sequencing on one of the new technology platforms,” according to the consortium.
Among these capture technologies are microarrays by Roche NimbleGen and oligonucloetide libraries from Agilent Technologies (see In Sequence 4/8/2008).
The consortium also encourages researchers to analyze genetic rearrangements with paired-end reads at low coverage, noting that “paired-end designs will be available for most new sequencing technologies.”
The first example of such a study was published earlier this week in Nature Genetics by Stratton and his colleagues, in which they sequenced two lung cancer cell lines using paired-end reads from Illumina’s Genome Analyzer (see Short Reads in this issue).
Last year, he and his team published another proof-of-principle study in Nature, in which they sequenced the coding regions of 518 protein kinase genes in 210 human cancer samples that included breast, lung, colorectal, gastric, testicular, ovarian, and renal cancer, as well as samples of melanoma, glioma, and acute lymphoblastic leukemia (see In Sequence 3/13/2007). That project used traditional Sanger sequencing.
The ICSC also recommends that participants use high-density genotyping arrays that provide copy number, loss of heterozygosity, and breakpoint information.
However, the consortium makes no specific recommendations for the type of technology researchers should use for analyzing RNA expression or DNA methylation, which could be analyzed by either arrays or new sequencing technologies.
Participating centers will also have to analyze a small number of standardized tumor and controls to show that they can produce data of sufficient quality, detecting at least 80 percent of somatic alterations with 95-percent accuracy.
“We want to make sure that the quality from each participating country is going to be up to the highest standards,” said Ozenberger.
The project is expected to last 10 years, and its duration will depend on the speed at which technologies develop and the time it will take to acquire enough high-quality samples for certain tumor types.
Acquiring samples might well be the greatest challenge of the project. “Many of the tumor types are rare, or it’s hard to get pure cell types,” Hudson explained. “The genome technologies keep evolving, but having access to high-quality samples will be hard.”
Stratton added that the consortium must “make collections of the cancer, very high-quality specimens with good information associated with them, specimens that are worthwhile spending a large amount of investment on to get them characterized in their whole genome.”
Consortium Conception
Plans for the ICGC emerged last October when cancer researchers and funding agencies from 22 countries met in Toronto to discuss strategies for how to analyze many cancer genomes comprehensively and in high throughput.
Following the meeting, they formed an executive committee representing funding agencies from 33 countries, including the 27 EU member states; a scientific planning committee; and seven working groups that focus on clinical pathology issues; sample-quality standards; genome analyses; informed consent and privacy protections; sample size and study design; data management and database coordination; and data release, data tiers, intellectual property, and publications.
A White Paper outlining the ICGC’s structure, goals, and policies is available here.
“It’s very much the next-generation sequencing technologies which are driving this consortium.” |
“This document is really an invitation” for additional prospective members, according to Hudson. “A consortium of this size is going to need many countries and many scientists, pathologists, and surgeons to work together,” he said. “We wanted to give them enough information of what the project is going to look like so they can make decisions.”
The hope is that additional groups will join the ICGC over the next few months. “This is to give them an opportunity over the summer to see if they have sufficient funds, and for other groups to maybe develop collaborations to come in for a kick-off at the end of the summer to really get things underway,” said Ozenberger. For a list of current members, see side bar.
Projects within the consortium are organized by cancer type, and each consortium member is responsible for either funding or characterizing at least one cancer type, or 500 samples.
ICGC members select the specific cancer type they want to analyze based on “public health impact … and unmet clinical need,” according to the document, but the consortium has stated that it aims to study cancers of all major organ systems, including the central nervous system; hemopoietic and lymphoid tissue; head, neck, and nasopharynx; skin; lung; breast; esophagus; stomach; colon; rectum; kidney; bladder; urinary tract; soft tissues; bone; pancreas; gall bladder; liver; biliary system; ovary; uterus; cervix; testis; endocrine tissues; and prostate.
Projects will also include both adult and pediatric cancers such as neuroblastoma, pilocytic astrocytoma, medullablastoma, osteosarcoma, and pediatric leukemias.
Tumor samples should come from untreated cancers, contain 80 percent of tumor cells, and comprise either 200 milligrams of solid tissue, or 10 million flow-sorted cells.
Hudson said he hopes that 10 projects will be started this year, “determined, initially, by interest and funds and access to samples.” His institute, OICR, will target pancreatic cancer and has committed $20 million to the project, he said. Last year, it already spent approximately $10 million on equipment. The center will use its five Illumina Genome Analyzers and five ABI SOLiD sequencers to analyze these samples (see In Sequence 3/18/2008).
The NIH’s Cancer Genome Atlas pilot project, which is funded with a three-year, $100 million grant from the National Human Genome Institute and the National Cancer Institute, will contribute its analyses of lung, brain, and ovarian cancer samples, which have not yet been published. The pilot project launched at the end of 2005.
But more US groups might join the consortium later. “We anticipate the US will have other groups come in to do different tumors” for the ICGC, said Ozenberger.
Game Plan
In general, consortium members — which have to agree to the ICGC’s policies and guidelines — can be either funding members, which provide financial support; or research members, which need to have existing or committed financial resources from a funding member.
A data-coordination center, to be headed by Lincoln Stein, who will join the OICR later this year from Cold Spring Harbor Laboratory, is going to manage the flow of data to a central ICGC database and public repositories, such as sequence trace repositories, microarray respositories, and genome browsers. The center will also perform quality-control checks of the data, facilitate data integration with other public resources, and manage the ICGC data portal.
The consortium may also establish quality-assessment centers to assure the quality of samples used in the project, though it has not yet decided how these centers will be organized.
Further, the ICGC has established a number of principles, or policies, by which its members must abide, and guidelines that constitute “best practices” at any given time. While “policies” are expected to be long-term, guidelines are likely to evolve over time as technologies advance and knowledge evolves.
For example, members will seek informed consent according to certain standards, and will make datasets available either through open access or controlled access. For example, normalized gene-expression data, DNA-methylation data, and genotype frequencies will be publicly available, whereas genome sequence files, raw genotype calls, and probe-level gene-expression data, which may identify the donors, will only be available to qualifying researchers.
Consortium members must also agree to rapidly release their data to the scientific community prior to publication, similar to the Human Genome Project, and to publish initial analyses “in a timely manner,” according to the White Paper.
They also must promise not to file for intellectual property based on primary data, such as somatic mutations, although they are allowed to make IP claims to “downstream discoveries.”
Hudson, who participated in other international projects like the Human Genome Project and the HapMap project, believes that this consortium is “the best planned in terms of really trying to address most of the issues upfront.
”However, that does not mean everything about the ICGC is set in stone. “We know what the issues are, but we still expect that we will need to evolve as a consortium frequently because of evolving technologies, database structures, and so on.”