Skip to main content
Premium Trial:

Request an Annual Quote

New Database Merges –Omics, Clinical Data to Make Cancer Treatment Personal


By Uduak Grace Thomas

This week, Georgetown University's Lombardi Comprehensive Cancer Center launched the Georgetown Database of Cancer, or G-DOC, a repository of cancer information and tools.

"G-DOC is a 'one-stop shop' designed to make the vision of personalized medicine a reality," Louis Weiner, director of the Lombardi center, said in a statement. He added that with the information and tools in the database, researchers "can develop a much more complete picture of what causes individual cancers to develop and to grow, and what new agents are needed to treat them."

Georgetown University provided $10 million in seed funding for G-DOC through its special projects initiative. Since development began two years ago, the project has also received funding as part of the National Cancer Institute's In silico Research Centers for Excellence initiative at Georgetown and also from the NCI's Cancer Center for Systems Biology.

Currently G-DOC contains genomics, proteomics, metabolomics, methylomics, and transcriptomics data from tumors, as well as clinical treatment and outcome information, for about 2,953 breast cancer patients.

G-DOC is part of "a systems medicine vision," that began at Georgetown three years ago, Subha Madhavan, director of clinical research informatics at Lombardi and one of the developers of G-DOC, told Bioinform. The vision, she said, involved asking, "How can we treat an individual by looking at all the data that is available about that particular patient and incorporating that into the whole model of evidence-based medicine?"

She noted that while several organizations, institutions, and government agencies are putting together integrative data portals, the "unique aspect" of G-DOC is that its "primary goal … is not just taking different types of -omics ….data and putting it together in a database but also connecting them to rich clinical outcome data and follow-up information."

To further clarify the distinction, Madhavan said that "while the Cancer Genome Atlas project is generating large amounts of data for several cancers and putting it out there for researchers,” G-DOC “goes a lot deeper into analyzing one or two types of cancers."

“The goal is to put the information at the fingertips of decision-makers, clinicians, and clinician researchers so they don’t have to depend on too many intermediaries to help them interpret their data,” she noted.

The web-based platform contains a genome viewer that let users visualize multiple data types, including gene expression, copy number variation, and clinical outcome data. The viewer also supports flexible clinical criteria browsing to enable specific cohort selection and generate detailed reports. Users can also browse drugs of interest using chemical structure and molecular property search functions, and study the molecular interactions of cancer drugs in a three-dimensional viewer.

Physician researchers, who are the target users for this incarnation of G-DOC, can sign up for password-protected accounts to view the datasets in the system as well as to watch tutorials and training videos that acquaint new users with the platform. Researchers can use the system to perform enrichment and combinatorial analysis to identify gene signatures or they can hunt for particular markers on a gene-by-gene basis.

The team also took "enormous measures" to ensure that the de-identified data isn’t compromised and to encourage individual researchers and groups to enter their data into the system before their research results are published.

The system, which is hosted at Georgetown's Data Center in Laurel, Md., has both backup and backup recovery systems and the data is secured using NCI's common security module, which includes an authentication and authorization module to verify users' identities and to authorize users to be part of different groups and view data from different studies.

"We have four tiers of information … the development and QA servers don't have real data," Madhavan said. "The staging and production servers have real data but they have been completely scrubbed to make sure there is no identifying information there."

For this initial release, the system's tools and datasets focus primarily on analysis of cancer types that are studied at Georgetown. This includes research into breast cancer relapse, estrogen-receptor-positive breast cancers, and gastrointestinal cancers including colon and pancreatic cancers.

G-DOC doesn’t "reinvent the wheel" by creating new tools from scratch, Madhavan said. Rather, its integrates well-known commercial tools such as Ariadne Genomics' Pathway Studio, as well as open-source tools such as NCI's caTissue and the Institute of Systems Biology's Cytoscape. The developers do plan to develop some tools where there is a dearth of available options, such as metabolomics data analysis.

She pointed out that metabolomics is an emerging area that has generated a "lot of interest" in the cancer research community because of its non-invasive nature. "You could actually come up with [disease] classifiers from urine from patients," she said. "Before, you needed biopsies."

The team is also currently considering incorporating Vanderbilt University's Research Electronic Data Capture, or REDCap, to handle clinical trial data collection for G-DOC. Currently she encourages users to choose REDCap to capture their clinical trial data although there isn’t an automatic link between REDCap and G-DOC and data has to be imported and exported

There are some new tools included in G-DOC, Madhavan said, including two molecular prediction algorithms that her team developed in concert with colleagues at Virginia Tech and an algorithm that's used to analyze tumor copy number variation information.

Madhavan's team plans to incorporate additional tools into G-DOC as early as next year that will make it more widely applicable. They also plan to include datasets from other types of cancer.

"That's the beauty of this platform," she said. "Now that we have the baseline architecture set up, it's quite easy for us to keep adding new methods for analyzing data and keep adding more samples on a rotational basis."

As part of these next steps, the team plans to build interfaces from G-DOC into two electronic health records systems used at Georgetown – ARIA and GE Centricity, both of which contain cancer information.

She added that the team is also working on metabolomics data analysis tools that it plans to incorporate in future incarnations of the database.

The Scan

Genome Sequences Reveal Range Mutations in Induced Pluripotent Stem Cells

Researchers in Nature Genetics detect somatic mutation variation across iPSCs generated from blood or skin fibroblast cell sources, along with selection for BCOR gene mutations.

Researchers Reprogram Plant Roots With Synthetic Genetic Circuit Strategy

Root gene expression was altered with the help of genetic circuits built around a series of synthetic transcriptional regulators in the Nicotiana benthamiana plant in a Science paper.

Infectious Disease Tracking Study Compares Genome Sequencing Approaches

Researchers in BMC Genomics see advantages for capture-based Illumina sequencing and amplicon-based sequencing on the Nanopore instrument, depending on the situation or samples available.

LINE-1 Linked to Premature Aging Conditions

Researchers report in Science Translational Medicine that the accumulation of LINE-1 RNA contributes to premature aging conditions and that symptoms can be improved by targeting them.