CHICAGO (GenomeWeb) – Bioinformaticians at Georgetown University Medical Center are expecting to see an uptick in genome-based research into brain cancers — even with data that is more than a decade old — thanks to the public release of an enhanced version of a dataset formerly hosted by the National Cancer Institute.
This month, the Innovation Center for Biomedical Informatics (ICBI) at the Georgetown Lombardi Comprehensive Cancer Center formally released its version of REMBRANDT (REpository for Molecular BRAin Neoplasia DaTa), which is second only to The Cancer Genome Atlas (TCGA) in size and scope among databases of biomedical information on brain cancer. The ICBI and other REMBRANDT collaborators detailed their work in an article published in the journal Scientific Data.
Among the listed authors is TCGA Director Jean-Claude Zenklusen, one of the original creators of REMBRANDT.
This clinically annotated REMBRANDT dataset covers 874 glioma specimens from 671 patients collected at 14 institutions between 2004 and 2006. This collection includes data from 566 gene expression arrays and 834 copy number arrays, as well as 13,472 phenotypic data points from clinical records.
The release of REMBRANDT caught the attention of former US Vice President Joe Biden, who tweeted about it. "We need efforts like this to advance progress against cancer," said Biden, whose son, Beau, died of glioblastoma. Biden now champions the Biden Cancer Initiative with wife, Jill.
Subha Madhavan, director of ICBI and chief data scientist at Georgetown University Medical Center, participated in brainstorming sessions for the Obama administration's Cancer Moonshot, which the former vice president headed up.
"They took seriously the issue of data in biomedical research and cancer research, and they also paid a lot of attention to data sharing," Gusev said of the Cancer Moonshot. "The idea is that if we share more in the research community, then we perhaps will accelerate discovery."
That is what REMBRANDT and its current host, the open-access Georgetown Database of Cancer (G-DOC), have been designed to facilitate.
REMBRANDT grew out of the National Cancer Institute and the National Institute of Neurological Disorders and Stroke, but in 2015, the NCI decided that it no longer wanted to host the platform. The institute asked Georgetown to take it over and host REMBRANDT on G-DOC.
The G-DOC repository has been around since 2009. "The idea behind this platform was for the Lombardi Cancer Center to develop a way for people to collaborate, to share their data, and be able to invite collaborators from anywhere to this web platform where they can look at molecular data in conjunction with clinical outcomes," said co-lead author Yuriy Gusev, a bioinformatician at ICBI.
G-DOC combines molecular data and clinical information from numerous cancer studies in a cloud environment, facilitating collaboration around the world. It also hosts a series of applications and tools, following an increasingly common strategy in bioinformatics of bringing analytics and computational power to the data rather than moving genomic datasets over the internet.
"The idea of G-DOC is to have a cloud platform with a massive amount of molecular information and tools in the same place so you don't have to download the data anymore. The tools are actually coming to the data instead of data coming to the tools," Gusev said.
"NCI took notice and decided that it's an appropriate platform for REMBRANDT," he added.
NCI still hosts the imaging component of REMBRANDT at the Cancer Imaging Archive. This contains presurgical MRI images from 130 patients in the database, but it has been linked directly to the main REMBRANDT database on G-DOC.
During the transition period, Gusev learned how popular REMBRANDT was. "A lot of people started to contact us in anticipation that we would be the new host for this data collection," he recalled. "Since then, many hundreds of people have accessed the G-DOC version of REMBRANDT," Gusev said. He expects the number to increase with the publication of the Scientific Data paper.
ICBI did have to make some modifications to the original database. "We reanalyzed DNA copy numbers because we have built-in tools," said Gusev, who led the team at Georgetown that migrated the data.
One such tool is the Chromosome Instability Index, or CINdex, which summarizes changes to copy numbers in tumors at the cytoband and whole-chromosome level. "It allows more of this kind of integration to look for correlations between DNA copy number changes at various levels and the clinical outcome," Gusev explained.
"Now we are sharing this new process data [together] with the old data. With this publication, we have released our own processed DNA copy data, in addition to the raw data," Gusev said. This, he believes, has breathed new life into records that date to the middle of last decade.
"It seems that this so-called old dataset is alive and well, and there are a lot of people using it right now for their research," Gusev said. "It helps to compare the old research to this published data. Some people do their own data mining in search of new biomarkers."
While acknowledging that TCGA has become the gold standard for molecular research into various cancers, REMBRANDT serves an important new purpose for brain cancer investigators.
"A lot of people routinely include [TCGA] for comparison in their datasets, but REMBRANDT now provides an independent validation set. Whatever you find with TCGA, you can validate by looking at the REMBRANDT study," Gusev said.
"I think that you're going to see a lot of new research coming out," Gusev predicted.
He noted that cancer researchers are looking for ways to connect molecular data to clinical outcomes.
"To do that properly, we need to have access to both clinical information and molecular profiling from these patients. By releasing both data types, clinical and molecular, to the public, we provide an opportunity for anyone now to do their own research," Gusev said.
"While the data might look like an old story, in reality, every year new methods are developed, new ways to look at the data, based on new biological discoveries," Gusev said.
For example, he noted that REMBRANDT contains a substantial set of gene expression data.
"People analyze it many different ways, but, recently, newer oncology applications [have been] developed, and suddenly it turned out that we could take the same gene expression data and look at them in completely different ways to analyze them with new tools," he said. This could help researchers figure out, for example, how many immune infiltrates are in a particular tumor.
Georgetown's ICBI is doing its own research with this data, including looking at how REMBRANDT can help cancer researchers understand immuno-oncology microenvironments around brain tumors. Gusev said that researchers at the Washington institution are intrigued by the imaging component.
"There are interesting possibilities to find connections between MRI images of these tumors and molecular profiles," Gusev said. ICBI informaticians have had several meetings with brain tumor researchers at Georgetown to discuss possible joint grant applications, he said.