Members of The Cancer Genome Atlas Pan-Cancer Initiative are gearing up for future phases of the effort — including a whole-genome-focused round of the project — even as findings from the first stage continue to reach the publication stage.
Leading the charge last week were two independent Pan-Cancer Initiative teams reporting in Nature Genetics.
A Memorial Sloan-Kettering Cancer Center group described dozens of oncogenic signatures detected using exome sequence data, array-based copy number profiles, RNA sequence, and more for almost 3,300 tumor samples from 12 cancer types. Meanwhile, researchers from Dana-Farber Cancer Institute, the Broad Institute, and elsewhere analyzed somatic copy number alterations found from array and exome sequence information on more than 4,900 tumors.
In addition to those studies and accompanying commentary articles published in Nature Genetics, another 14 articles are slated for publication by mid-October as part of the first wave of papers from TCGA's pan-cancer arm — an effort that's expected to yield an estimated 80 papers in the relatively near future.
"[T]he full potential of the enterprise will be realized only after time and with broader efforts," corresponding author Joshua Stuart, a University of California at Santa Cruz researcher, and his colleagues wrote in one of the commentaries. "Still, the collection of TCGA Pan-Cancer publications represents a significant contribution to a new period of discovery in cancer research."
Bolstered by information gleaned by studying patterns across cancers rather than relying on exclusively tissue-specific comparisons, Pan-Cancer members are now planning for the next stages of the study. Those are expected to include validation studies stemming from results so far, along with studies on larger and larger tumor sets and cancer types.
Around 25 cancer types are currently targeted for genomic profiling through the broader TCGA effort. For the next round of Pan-Cancer Initiative studies, investigators are getting set to do cross-cancer studies that incorporate whole-genome sequences generated for a subset of those tumors.
Roughly 1,000 tumor-normal pairs have been subjected to whole-genome sequencing for TCGA. The number of tumor-normal pairs available for future pan-cancer analyses may double if TCGA's Pan-Cancer team collaborates with members of the International Cancer Genome Consortium, or ICGC, as some anticipate.
Members of TCGA and ICGC have started discussions to explore the possibility of bringing together whole-genome sequences generated by both of the cancer consortiums, according to UCSC's Stuart, a TCGA Pan-Cancer project co-organizer who met with ICGC members during an ICGC meeting in Toronto this week.
Stuart noted that some ICGC researchers participated in the first round of TCGA pan-cancer studies, including Gary Bader from the University of Toronto and Nuria Lopez-Bigas with Pompeu Fabra University and the Catalan Institution for Research and Advanced Studies in Barcelona.
Still, talks regarding such a combined effort are at an early stage, Stuart said, and there are hurdles to overcome, including issues related to storing, protecting, and accessing the massive amounts of sequence data and patient information associated with that many genomes.
"What I'd like to see is somebody from industry step up … to get a private cloud going that will guarantee protection of the sensitive patient data," he noted.
The current round of TCGA Pan-Cancer Initiative studies has grown over the past year or so, with more and more teams scrutinizing large TCGA tumor datasets to try to find new patterns in tumor biology and the processes involved in cancer development.
Those involved are also optimistic about the prospect of finding shared treatment targets that span multiple cancer types — a possibility that appears realistic based on findings from the first studies stemming from the effort.
In their oncogenic signature study, for instance, Memorial Sloan-Kettering Cancer Center researchers brought together copy number, exome sequence, methylation, and other genomic information to define a set of 479 "selected functional events," or SFEs, that they subsequently scrutinized in 3,299 TCGA tumors.
These SFEs, determined with the help of statistical approaches and information learned about cancer genomes over the past several years, were designed to simplify the data by focusing on alterations with anticipated functional effects. They also helped in classifying tumors in a genomics-based fashion.
Generally speaking, that team found that tumors rife with copy number changes tended to contain relatively few somatic mutations and vice versa, for instance, though some samples had intermediate levels of each. Dozens more tumor sub-types could be defined genomically within those two main groups.
In addition, authors of that study took their analysis a step further, mapping shared pathway changes and potentially actionable alterations across the complete tumor set.
"We connected these [SFEs] to both pathways and drugs," MSKCC's Chris Sander, a Pan-Cancer co-organizer and co-senior author on the oncogenic signature study, told CSN.
"That led to the hypothesis that there are sub-groups of tumors with particular oncogenic signatures, such that a certain set of patients with those alterations might be a good group to nominate for clinical trials using particular combinations of drugs."
There is enthusiasm about doing genomics-based trials to test drugs or drug combinations on patients whose tumors share oncogenic signatures and actionable alterations, he added, though many are currently at early stages.
The nature of the SFEs considered in future studies of oncogenic signatures may shift somewhat, Sander noted, as researchers incorporate additional information on structural rearrangements gleaned from whole-genome sequence data as well as various genomic profiles representing tumors that have metastasized from their site of origin in the body.
The notion of bringing together information from tumors spanning many different classically defined tumor types isn't new, though the scope of TCGA's Pan-Cancer Initiative is, explained Dana-Farber Cancer Institute researcher Rameen Beroukhim.
Together with Matthew Myerson, also from Dana-Farber, and the Broad Institute's Gad Getz, Beroukhim led a Pan-Cancer group focused on characterizing somatic copy number alterations, or SCNAs, across 4,934 TCGA tumors from the same cancer types considered by Sander and company.
The samples included in the copy number analysis had all been subjected to array-based copy number profiling. For some stages of the study, the researchers also turned to whole-exome sequence data, which was available for around 3,000 of the tumors.
The reliance on array-based approaches is changing as investigators become more adept at dealing with whole-genome sequence data, Beroukhim noted, which can theoretically offer refined resolution of structural changes in tumor genomes.
"The ability to generate copy number profiles from sequencing data is improving," he said, "so I'm hopeful that clinical tests that are sequencing-based will be able to determine copy number profiles from cancers."
For their current study, he and his colleagues narrowed in on 140 SCNAs that were recurrent across all of the tumor types tested, including 70 recurrently amplified regions and as many parts of the genome showing recurrent deletions.
A fraction of those contained known oncogenes or tumor suppressor genes, though the majority did not, suggesting further scrutiny of these loci could lead to new cancer contributors — a possibility that needs to be explored through future studies in the lab and perhaps in the clinic as well.
"Our hope is that the 140 regions we determined include many of the important copy number changes in cancer — and that sequencing-based tests assessing the genetics of any particular tumor would want to pay special attention to those 140 regions," Beroukhim said.
In addition to the recurrent SCNAs, the researchers were able to use information on the size of copy number changes to gain clues about the events that produced them. They were also able to use copy number data to look at the rates of whole-genome duplication events across cancer types.
For instance, glioblastoma multiforme had low rates of whole-genome doubling, despite showing high rates of amplification and deletion driver events. On the other hand, breast cancer had high rates of whole-genome duplication, despite being classified as a mutation-heavy tumor type.
Along with future studies to validate and delve into the functional roles of loci affected by those recurrent changes, the team plans to continue performing its SCNA analysis and profiling using data from larger sets of TCGA tumors as they become available.
The copy number study, like others performed by teams working under the Pan-Cancer umbrella, was based on an agreed upon set of TCGA tumors. Going into the December 2012 data freeze used for the current round of Pan-Cancer Initiative papers, members of the team defined a set of first-line analyses for the project, which Stuart called "a set of things that everybody kind of needed — or a lot of people needed."
These included accurate mutation calls on the whole-exome sequence data, for instance, which turned out to be somewhat complicated to get.
Indeed, detecting the range of genetic differences between tumor and normal samples from each individual "is one of the hardest things in cancer genomics right now," according to Stuart, who said he did not originally appreciate how tricky that task can be.
In the hopes of simplifying that aspect of future analyses, Stuart and his colleagues plan to launch a DREAM challenge project next month that is focused on bioinformatics solutions for mutation calling. Challengers will be tasked with finding ways to reliably call mutations in five cell lines with specific alterations that have been introduced into the mutations as well as 10 real tumor samples.
"We will follow up in the lab and validate tens of thousands of calls from algorithms," Stuart said. "Hopefully by this time next year we'll know the best algorithms: they'll be the ones at the top of the leader board."
Stuart noted that there are computational challenges to deal with when considering future iterations of pan-cancer studies that deal with whole-genome sequence data, particularly with respect to storage, privacy, and researcher access to the data.
For the Pan-Cancer studies performed so far, Seattle-based Sage Bionetworks designed software called Synapse for sharing data between the study's collaborators and incorporating results from their analyses, including an agreed upon mutation call set. A beta release version of the tool was used for pan-cancer efforts so far and is described in a Nature Genetics commentary article.
"Our work represents a pilot project designed to demonstrate the ability to facilitate a large-scale, distributed collaboration, including design choices intended to minimize barriers to adoption and achieve engagement from all collaborators," authors of that paper noted.
"As we improve and expand the capabilities of Synapse in the context of future phases of the Pan-Cancer project and related collaborative projects," they continued, "it will be interesting to explore the tradeoffs between enforcing constraints on how many users may perform analyses and represent results versus providing a flexible set of tools and allowing standards and protocols to emerge organically from users."