Skip to main content
Premium Trial:

Request an Annual Quote

New NHGRI Resource Aims to Support More Efficient Use of Genomic Data in Clinical Settings


Researchers from the National Human Genome Research Institute are seeking feedback on a newly developed resource called the Clinical Genomic Database, or CGD, that they claim will help researchers and clinicians use whole genome and whole exome sequencing data more efficiently in clinical contexts.

CGD was created, its developers explain, to provide a more direct route to information about clinically relevant genetic variants and, where available, associated treatments and therapies.

In a paper published last month in the Proceedings of the National Academy of Sciences, they argue that while "a number of freely or commercially available tools allow curation of individual genomes, including analysis of variant type, predicted pathogenicity of a particular variant, and associations of the gene or specific variant with known health conditions," it's still a challenge "to determine, which genetic variants may warrant further follow-up … or would otherwise alter patient care."

For example, the researchers wrote, resources such as the Online Mendelian Inheritance in Man provide "vast repositories of rich clinical and genetic knowledge but they "may be harder to query for efficient clinically oriented analysis." Others like the Human Gene Mutation Database are only "valuable when considering the potential pathogenicity of detected genetic variants but … not [the] clinical implication in a particular healthcare situation."

Another problem is that relevant data is often spread out over multiple resources and in some cases is incomplete. Benjamin Solomon, a staff clinician in NHGRI's medical genetics arm and one of the authors of the PNAS paper, told BioInform that his group encountered these problems when they tried to identify the genetic causes of some congenital disorders they were studying.

"It became increasingly clear to me that trying to piece together what was medically important or not from a genomic dataset was very challenging and involved going to multiple different databases and … going through a lot of primary literature [where] some of the information was incomplete or conflicting," he said.

These difficulties led Solomon and his colleagues to develop the CGD as a means of simplifying the search for clinically relevant variants. Building the resource, he said, required manually going through information about genetic diseases and associated variants from resources like OMIM and HGMD, as well as relevant primary literature. Solomon also contacted genetics experts to find out if they had further information about the genes that hadn't been published, he said.

Currently, CGD contains over 2,600 genes with known disease-causing mutations or "clinically significant pharmacogenomic implications." Of the total, about 1,300 genes have known medical interventions while the balance — about 1,200 genes — are potentially clinically relevant mutations that have no known medical interventions.

For each entry, the database includes "the gene symbol, conditions, allelic conditions, clinical categorization, mode of inheritance, age category in which interventions are indicated based on descriptions in the medical literature, general descriptions of the interventions/rationale, and individually linked references," the PNAS paper states.

Users can search for information by gene or condition or they can browse for results by two clinical categories — disease manifestation and intervention. Users can download all or portions of CGD's content in a single file.

Currently, Solomon et al are actively seeking feedback from the research community about the CGD's content that will be used to "continually revise and improve the resource."

Long term, they hope to establish CGD as "a user-friendly resource relevant to a wide group of clinicians that can be used as a reference resource in a variety of situations," the paper states.

The developers also believe the CGD could eventually be used "as a filter superimposed on automated binning algorithms" — tools used to classify variants as pathogenic or otherwise — "to help allow efficient, clinically relevant annotation of human genomes," the paper states.