By Julia Karow
This article was originally published Nov. 12.
As part of the recent expansion of the National Institutes of Health's Cancer Genome Atlas, the National Cancer Institute recently awarded $16.6 million to six Genome Characterization Centers and $7 million to six new Genome Data Analysis Centers in fiscal year 2009.
Total funding for the five-year awards has not yet been finalized.
NIH said at the end of September that following TCGA's pilot project, which served to develop the infrastructure for characterizing the genomes of hundreds of tumors, the project will be scaled up with $275 million in new funding to include more than 20 tumor types. Over the next five years, thousands of samples will be sequenced by three genome centers, at the Broad Institute, Baylor College of Medicine, and Washington University School of Medicine (see In Sequence 10/6/2009).
The six GCCs, several of which already participated in the pilot project (see GenomeWeb Daily News 10/16/2006), will characterize genomic changes in tumor samples using multiple platforms, including second-generation sequencing and array technologies.
According to the website, each center has a different focus area and will provide data on alterations in miRNA and gene expression, SNPs, epigenetic changes, and copy number alterations. One center will also validate specific regions of interest through targeted sequencing.
Two of the Genome Characterization Centers will focus on characterizing copy number alterations and SNPs: one led by Raju Kucherlapati at Brigham and Women's Hospital and colleagues at Harvard Medical School, which received $650,000 in funding in fiscal year 2009, the other headed by Matthew Meyerson and colleagues at the Broad Institute, which received $610,000 in FY 2009.
The GCC at Brigham and Women's Hospital plans to identify regions in cancer genomes that are amplified or show deletions or loss of heterozygosity by analyzing up to 2,500 tumor samples a year, and to identify a list of promising genes for resequencing, according to the grant abstract.
Initially, the researchers plan to use high-density Agilent nucleotide arrays for array comparative genomic hybridization experiments, but over the course of the five-year award, they plan to gradually switch over to sequencing, using a sequence tag-counting approach, which they want to use exclusively during the last two years of the project.
After analyzing the aCGH and sequence data with informatics tools they have developed, they want to extract a list of interesting genes for resequencing by other TCGA members.
The Broad Institute's center aims to characterize the genomes and transcriptomes of 10,000 cancers over the course of the five-year award, and also plans to transition from microarray technologies to next-generation sequencing. In particular, the researchers plan to characterize DNA and RNA from 2,000 cancer samples and controls using microarrays during the first year. At the same time, they want to compare and validate three unnamed sequencing platforms by characterizing DNA from 300 cancer and normal pairs and RNA from 100 cancer samples, according to the grant abstract.
Based on the results, they plan to select "the most cost-effective sequencing platform" at the end of the first year, in conjunction with NCI staff members, and implement that platform to characterize DNA and RNA from 2,000 cancer samples and controls in years two to five.
A center led by Peter Laird at the University of Southern California, in collaboration with Johns Hopkins University, obtained $2 million in fiscal '09 and will focus on epigenomics. It, too, wants to transition from array-based to sequencing-based methods during the lifetime of the award.
One of the USC-JHU Cancer Epigenome Characterization Center's goals is to characterize DNA methylation in approximately 28,000 CpG dinucleotides in at least 10,000 cancer samples and 1,000 controls using the Illumina Infinium array platform. Another goal is to transition epigenomic data production to whole-genome shotgun bisulfite sequencing in order to obtain single-base resolution, and a third to implement quality control and assurance measures.
Gene expression patterns will be analyzed by a center led by Chuck Perou at the University of North Carolina, Chapel Hill, which was awarded $3.7 million in fiscal '09.
According to the grant abstract, this center plans to perform quantitative gene expression profiling of protein-coding genes, non-protein coding mRNAs, and microRNAs on 2,000 tumors per year, using an unspecified platform.
In addition, it will use a method called formaldehyde-assisted isolation of regulatory elements, or FAIRE, coupled to next-generation sequencing, to profile regions of "open" chromatin domains in cancers, and to integrate those data with the gene-expression data.
The analysis of miRNA is going to be the focus of a center led by Marco Marra at the British Columbia Cancer Agency, which received $2 million in fiscal '09.
This center will specialize in preparing and analyzing sequencing libraries for mRNA and microRNAs from cancer cells and tissues and plans to sequence 4,400 such transcriptome libraries in the first year. According to the grant abstract, the center, which uses the Illumina Genome Analyzer technology, has already established a library construction core, which it plans to scale up, as well as a next-generation technology development core.
Targeted sequencing will be emphasized at a center led by David Wheeler and colleagues at Baylor College of Medicine, which received $2 million in fiscal '09.
The BCM Tumor Genome Characterization Center plans to analyze sets of tumors and, when appropriate, matched normal tissues in 500 patients for each of up to 25 tumor samples over the course of the five-year awards, using solely the Applied Biosystems SOLiD sequencing platform. The analysis will include genome-wide expression levels of mRNA to find aberrant splicing and gene fusions, and eventually bi-allelic expression levels for heterozygous loci. In addition, it will include copy number variation analyses in genomic DNA with a resolution of down to 10 kilobases. In the third year, the researchers plan to include diTag libraries to be able to detect breakpoints precisely.
Using a sequencing-based approach from the outset, the abstract states, "will avoid a complex transition of the program from a chip platform to a sequencing platform in the early years of the program." Also, the sequencing data "will be immediately comparable and complementary with whole-genome sequencing approaches" conducted in parallel by the three NHGRI-funded genome centers for TCGA.
Due to the "need to integrate different data types and the immense quantity of data" generated by the project, the TCGA Research Network recently added six Genome Data Analysis Centers, which will work with the GCCs "to develop state-of-the-art tools that assist researchers with processing and integrating data analyses across the entire genome," according to the NCI's website.
Two centers — one at the Broad Institute, which received $2.7 million in fiscal year 2009, and the other at Lawrence Berkeley National Laboratory, which was awarded $650,000 in fiscal '09 — are designated "type A" and will be responsible for implementing "a bioinformatics approach to the high-throughput processing and analysis of genome-wide data coming in from each of the GCCs," according to the website.
The other four centers are at the Institute for Systems Biology — in collaboration with the University of Texas/MD Anderson Cancer Center — which received $1.1 million in fiscal '09; Memorial Sloan-Kettering Cancer Center, which was awarded $140,000 in fiscal '09; the University of California at Santa Cruz, which obtained $1 million in fiscal '09; and the University of Texas/M.D. Anderson Cancer Center, which received $1.5 million in fiscal '09. These centers, called "type B," will be responsible for "developing innovative bioinformatics and computational tools that can draw biologic and clinical correlations from the genomic datasets delivered by the TCGA Research Network," the website states.
NCI plans to provide more information soon about how the GDACs will integrate with other components of the TCGA Research Network, and about the tools they will be developing.