Genomic Data Collaborations Advancing in Hopes of Improving Personalized Cancer Research, Care

NEW YORK (GenomeWeb) – In recent months, several new personalized medicine-focused data sharing projects have popped up in oncology, suggesting that stakeholders in the life sciences are more willing to work collaboratively to advance research and patient care.

In genomics, there is no shortage of collaborative research, but there are competitive, legal, and technological bottlenecks when it comes to sharing genomic data linked to patient outcomes. At the same time, experts agree that these data must be broadly shared in order to make more discoveries and move precision medicine to the frontlines of clinical care.

During a call earlier this month to discuss the data landscape in oncology, hosted by the Harvard Business School Kraft Precision Medicine Accelerator, Sudheer Doss, a director of PricewaterhouseCoopers' health industries advisory services, tracked the number of initiatives between June and December last year, and noted that there had been "quite a bit of activity" during that time and a "trend toward more data sharing."

Most of the precision medicine data initiatives are aimed at informing oncology drug development, spurring research into disease etiology and drug target discovery. But more recently, a number of projects have launched that are collecting genomic information to optimize care, which Doss said is indicative of a "strong push" to drive personalized medicine into the clinic. He added, though, that not too many of these data sets are currently shared in the public domain.

These efforts currently are largely funded by government initiatives, such as the Million Veterans Program or the Database of Genotypes and Phenotypes, or by drug firms and the life sciences industry, as with Foundation Medicine's FoundationCore or Ambry Genetics' AmbryShare. However, Doss noted that philanthropic donations are also becoming a growing source of support.

One of the major government-backed data sharing efforts launched last year was the National Cancer Institute's Genomic Data Commons, a centralized resource for standardized datasets, genomic and clinical information, from large-scale projects, such as The Cancer Genome Atlas (TCGA), as well as from genomic testing firms and cancer researchers. 

Currently, GDC has 4 petabytes of data, according to Lou Staudt, director of the Center for Genomics at the NCI. Last year, Foundation Medicine donated to GDC data from 18,000 cases stored in its FoundationCore database, though the data are not quite available yet from GDC.

"It's a very interesting learning curve that we're on," Staudt said. "Every different data type that comes in needs a lot of special handling" to get it into a harmonized format.

Although the developers of the GDC are working on developing standard data submission forms and a uniform data vocabulary, Staudt acknowledged that there will be aspects of the GDC that will never be fully automated. "People will have generated their own data systems, in the logical way they saw fit," he said. "That will be somewhat different from the logic of the GDC. So, there will be this necessary time to bring them into harmony."

By year end, the GDC is hoping to collect data from 50,000 cases, but the goal is to house more than 100,000 cases. "The payors are going to dictate the pace," Staudt reflected. "There's only so much genomics that can be afforded at this time."

But if reimbursement improves for genomic tests, so will the pace of the GDC's work. "Big changes will happen if reimbursement starts happening for these genomic tests," Staudt said. "If it does, then the flood gates will open and every cancer patient will have a genome, and we'll take it."

The Multiple Myeloma Research Foundation is another group that last year said it would submit data from its CoMMpass study to the GDC. The trial, involving approximately 1,150 patients, has collected clinical outcomes and genomic profile information.

"Annotations of MMRF samples are very extensive," according to Staudt. "They have aspects of timelines for patients — what happened to the tumor and when did it progress." The GDC and MMRF are in the process of bringing this clinical data into the database.

Right now, GDC users can search the resource across various fields. The aim for this year and next year, however, is to create visualization tools, more intuitive analytics, and point-and-click functionality that make it easier for non-computer scientists to navigate the data commons.

"You'll be able to do things like create synthetic cohorts of patients," Staudt said. He explained, for example, that users can define a cohort of patients by certain clinical or genomic parameters — non-small cell lung cancer patients with KRAS mutations, for example — and choose a second group with different characteristics and compare them in terms of survival, treatment response, or other outcomes. 

"You will be able to almost write your paper [based] on the Genomic Data Commons," he said. "You will download the figure in PDF format and it will probably be publication ready."

Meanwhile, the American Association for Cancer Research's Project GENIE, short for Genomics Evidence Neoplasia Information Exchange, is also expanding. In this effort, eight cancer centers are currently sharing genomic and clinical data into a single registry in an effort to advance research and improve patient care. The registry currently contains limited clinical and next-generation sequencing data from approximately 19,000 samples, which became publicly available in January. 

This first data release highlighted the different ways in which the participating centers were conducting genomic profiling, with some centers doing tumor-only sequencing while others also sequenced matched normal tissue. These institutions are using panels with between 48 and 429 genes. The registry now has data from a variety of tumor types, including more than 3,000 NSCLC, 2,000 breast cancer, and 2,000 colorectal cancer samples.

The MMRF also has attempted to encourage data collaborations by hosting a crowdsourcing competition in which participants tried to develop a genomic algorithm that can identify which multiple myeloma patients are at high risk of progression. MMRF President Paul Giusti noted that over the course of the CoMMpass study, approximately 200 patients have passed away because they had a particularly aggressive form of the disease. "If we could identify those patients, we would treat them differently, we might be more aggressive with their therapy and be able to prolong lives," he said.

The contest, which recently closed, received nearly 700 algorithm submissions from 49 individuals in 24 countries. MMRF, Topcoder, and Harvard University's Crowd Innovation Lab are still reviewing the submissions, but Giusti noted that they've already identified five algorithms that can separate aggressive versus non-aggressive multiple myeloma better than the current standard methods.

The MMRF is now looking to validate these algorithms in other datasets. The organization is also planning to host another contest to try to combine the five algorithms to get a "better net predictive result." 

One of the main lessons learned through the crowdsourcing context, Giusti said, is that "not only do you need great data but you need great access to the data … The access is critical."

Labs in the US and South Korea are hoping to bring the woolly mammoth back from beyond extinction, Newsweek writes.

Researchers link genetic links between education and smoking and longevity.

Geneticist Adam Rutherford speaks with National Geographic about paleogenetics, race, and more.

In PNAS this week: influence of gene environment interactions on polygenic traits, epigenetic features affecting fruit fly foraging, and more.

Nov
02
Sponsored by
Qiagen

This webinar will discuss the benefits of using unique molecular indices to overcome some challenges associated with next-generation sequencing panels.

Dec
05
Sponsored by
Agilent Technologies

This webinar will discuss a molecular barcode-based error correction method that enables combined mutation detection and DNA copy number profiling through circulating tumor DNA sequencing.