Studying cancer is an enormous undertaking. Many researchers spend their lives trying to understand the genetic origins of the disease in its many forms, and how to stop it in its tracks. While most concentrate their efforts on a specific type of cancer, there are others who have taken on an even more difficult task: studying all cancers at the same time to try to determine which mutations in the cancer genome actually affect human health, and which ones don't. It's a task too large for any one lab, says the Dana-Farber Cancer Institute's Rameen Beroukhim. His solution, then, was to establish collaborations with the Broad Institute and several international labs to study thousands of cancer samples and try to determine whether they have any mutations in common.
Beroukhim says that the idea of doing copy-number analyses of different cancer genomes came to him in 2004 when he was a postdoc in Bill Sellers' lab at Dana-Farber. When Beroukhim joined Matthew Meyerson's lab shortly afterward, the project gained momentum as additional people came on board. Beroukhim and Meyerson, who are also both affiliated with the Broad, began looking for collaborators there as well. "Broad is a special kind of place which encourages collaborations and large-scale projects where a lot of people's expertise can be brought to bear on a single subject," Beroukhim says. "The result is that you can produce data sets and do analyses that you can't do in a lot of other places."
Craig Mermel, a graduate research assistant at the Broad, agrees. He says the research team was able to leverage the institute's technological resources to do very high-throughput genomic characterizations. But when the data started piling up, he adds, the project's affiliation with Dana-Farber became important. "One of the problems with doing genomic analyses is that, by definition, you're going to discover something that you may have no expertise in … and Dana-Farber made it very easy to find the world experts who could help us think about the perfect questions to ask," he says. Indeed, many of the Dana-Farber researchers contributed years of experience and expertise to the group.
It was a natural collaborative effort between the labs, Mermel and Beroukhim say — there's just too much data to be analyzed by only a few people. In the end, the group's first study, published in Nature in February, had about 70 authors, Beroukhim adds.
Shedding light on oncogenes
The cancer genome is, almost by definition, distorted. Some additions and deletions drive cancer growth and proliferation, while other mutations are simply random and have no real effect. Beroukhim's interest lies mostly in characterizing cancer genomes, he says, and determining what the characterization says about cancer classification, how the disease behaves, and to what kinds of treatments it is likely to respond. Mutations occur throughout the body, Beroukhim adds, but tumors are enriched for those mutations, and though copy-number variations are found throughout the human genome, they're more concentrated in locations where they could lead to cancer.
The team began with 3,131 cancer specimens from various labs in the US — including the Broad and Dana-Farber — as well as from Japan, Spain, Canada, and Norway, representing 26 cancer types. Using Affymetrix SNP arrays — to interrogate nearly 240,000 sites across the genome per sample — the researchers took three to four years to analyze the data to find common mutations in each cancer type, Beroukhim says.
Much of the work that's traditionally been done in the past several years has focused on one kind of cancer at a time, Mermel says. But by putting many cancer types together in one study, the researchers found that oncogenes such as MCL1 and BCL2L1 — which allow cancer to grow by preventing apoptosis — are found across cancer types, though they tend to affect different cancers at different rates. "The findings suggest that amplification [of these oncogenes] might be a biomarker" for cancer treatment, Mermel says. There's at least one drug on the market, and several in development, targeting BCL2, he adds.
Beyond treating cancer, Beroukhim says, it's easy to imagine finding these mutations and targeting them in people who don't have cancer, in order to remove a risk factor for the disease and stop it from forming in the first place.
Beroukhim also hopes this project will change the way cancer is perceived. "I like the approach of thinking across cancer types," he says. "It's worthwhile to start thinking of cancer along different lines, where you start thinking about the mutation as primary and the tissue type as a secondary thing. It turns the whole classification of cancer on its side." Eventually, it may be possible to start thinking about treating cancer based on a specific mutation instead of the tissue in which the cancer began to grow.
Overcoming a challenge
These findings didn't come easily to the researchers. A project of such massive undertaking required many minds to do the work, analyze the data, and formulate theories and conclusions. Such a big group generating so much data comes with its own set of unique challenges. "Once people find out you have this data set, they come knocking on your door to see if they can help," Mermel says. "The challenge is in managing the quantity of collaborations that can arise and making sure you're disseminating information in a responsible way."
Collaborating as efficiently as possible was the key to making the project work, he adds, as was continually checking to make sure the researchers were working systematically and were careful not to overextend themselves. "Finding the right balance there is very difficult," he says.
Beroukhim says that sorting through the data was also challenging. "We keep finding that some events we thought were important actually might not be, and that other events we didn't recognize to be important actually are," he says. With each new recognition came the work to identify which event amplified which gene, and how it could affect the formation and growth of cancer.
All the new technology — next-generation sequencers, for example — that makes the lives of researchers easier also makes the organization of large amounts of data difficult as more and more information is generated. "We need computational methods to deal with all the high-res info," Beroukhim says. "It has recently become possible to do comprehensive characterizations of the cancer genome in a single assay down to single-base resolution, and the challenge will be to organize the data, not get overwhelmed, and try to make sense of it all."
The work isn't even close to being done. Though the team had more than 3,000 cancer samples to test, it was a skewed data set in which certain types of cancers were overrepresented, and other types not represented at all, Mermel says. The researchers are continually trying to expand their collection of cancers, both through their own efforts and through resources such as The Cancer Genome Atlas. With the technology continuing to be improved, researchers can now visualize events at much finer levels of resolution, he adds, "so we'll be able to see events that we've missed because we weren't able to see them with the older technology." The researchers were also unable to look for biomarkers related to age, sex, or ethnicity in the samples because of the way they were skewed, Mermel says, so with a larger number of more diverse samples, future studies could attempt such classification.
"This is an ongoing project," Beroukhim says. "We are interested in continuing to look at large numbers of cancer samples across cancer types." With more samples will come fresh data, necessitating even more research to get it organized and analyzed, to the point where it can become a launching pad for new cancer treatments.