CHICAGO (GenomeWeb) – For a National Cancer Institute-funded project to standardize cancer genomics terminology, the George Washington University School of Medicine and Health Sciences is turning to an unexpected source: the NASA Jet Propulsion Laboratory.
JPL, funded by the National Aeronautics and Space Administration and managed by the California Institute of Technology in Pasadena, California, actually has provided informatics technologies and services for NCI's Early Detection Research Network for more than a decade and a half. And the chief of informatics for EDRN is Daniel Crichton, who manages the JPL Center for Data Science and Technology.
"We get involved and help them manage some of their big data integration issues," Crichton said.
Now, Crichton is serving as co-principal investigator on an oncology ontology standardization effort at GWU, funded by a three-year, $1.2 million grant that NCI awarded a month ago.
NASA has developed numerous sensors and measuring tools for planetary sciences that can apply to medicine. The space agency's Planetary Data System has amassed 1.5 petabytes worth of data from planetary observations, according to Crichton. "There's a lot of consistencies in our approaches," Crichton said.
"This is an excellent way to demonstrate that [standard ontologies] increase the knowledge system around cancer biomarkers," Crichton said.
Raja Mazumder, associate professor of biochemistry and molecular medicine at the GWU School of Medicine and Health Sciences, is leading the development of two databases to standardize terminology and normalize research data on cancer-related genes. One database is called BioMuta, for gene mutations, and the other BioXpress, for gene expression.
"For researchers within that EDRN network, they're looking for biomarkers for early detection of cancer. Within that network, what we have been doing is trying to integrate publicly available gene mutation and gene expression data," Mazumder explained.
Between now and April 2020, the team will build a portal tentatively called Cancer GEM — for gene expression and mutation — though Mazumder said he is considering a name change to OncoMX, with M standing for mutation and X for expression. They also will compile a list of use cases for standardizing terminologies.
The BioMuta and BioXpress databases and framework already exist, thanks to previous pilot funding from and collaboration with EDRN. "We use technology that [Crichton] has developed," Mazumder said. By the end of August, the plan is to have new versions available with more data and documentation, he added, and there should be an initial release of Cancer GEM or OncoMX in 2018.
"We wanted to take it to the next level and make it available to the wider community," Mazumder said of the new grant.
He noted that NIH-funded studies on genomics often just result in publication of academic papers. "But there are some major projects where all the data is collected, and it's made available through project-specific web portals," he said. In the US, that usually means the Cancer Genome Atlas.
"When TCGA provides this data, they have their own cancer terms that they came up with," Mazumder said. But the International Cancer Genome Consortium has its own ontology, as do other compendia that researchers rely on.
"I realized that the different cancer terms might be annotated differently in different resources," Mazumder said. "That's a bottleneck and it requires manual intervention," which is not only time-consuming, the end result varies from institution to institution and even from project to project.
"Because the cancer terms are different, somebody like me who wants to collect all of this information and make it available, I have to go and manually map the different cancer terms," Mazumder explained. "Why can't we harmonize these terms?"
Several years ago, Mazumder brought together a diverse group of people from hospitals, from NCI, and from the cancer genomics community, to discuss the harmonization of cancer genomics terminologies. They decided to build on the work of researchers from Northwestern University and the University of Maryland, Baltimore, who landed a $1 million NIH grant in 2009 to develop a computable disease ontology.
One of the co-PIs of that earlier project was Warren Kibbe, who now is director of the Center for Bioinformatics and Information Technology and deputy director of the NCI.
In 2015, Mazumder teamed with Crichton and others from GWU, NCI, and JPL to publish a paper in Database: The Journal of Biological Databases and Curation. That work is serving as part of the framework for BioMuta and BioXpress.
The researchers are applying various data integration technologies. "We need ontologies and standardization — and then there's the literature mining part of the story, which is completely different technology," Mazumder said.
The latter element scans medical literature to find mutation and expression data from published reports. "We get data from genomic projects and we also get data from publications, and we overlay those together," Mazumder said. This lets people see, for example, if a specific gene is highly expressed in breast cancer and if there is literature showing similar findings elsewhere.
"We want to have that knowledge captured from genomics work and captured from literature and presented to the user so that they can look at the information," Mazumder said.
"It can help in clinical decision support," he said. "That's where we would like this to go in the future. But for the next three years, we are just making sure that we have the infrastructure to get to that level."
Beyond the scope of this NCI contract but within the purview of a US Food and Drug Administration contract Mazumder is working on, he also is interested in knowing if regulatory scientists will use his cancer genomics resources. Mazumder is working with the FDA on defining bioinformatics computations for the purpose of submitting data for regulatory review. "Right now, it's a gray area," Mazumder said.
He said it would help researchers and FDA reviewers alike know how a mutation is detected and allow them to "show their work," like a student taking an exam.
"Currently, in these large genomics projects, that type of detailed information does not exist," Mazumder said. "We hope that will change in the future because that is a critical piece of information. You always want to say how you got the results when you are making decisions."
He said that the NCI-funded contract work "will also follow some of the lessons we have learned from the bio-compute project."