One of the biggest challenges in bringing sequencing technology into medical care is the lack of a single, clinically validated database of DNA variants implicated in human disease, but a group of researchers is looking to resolve this issue by integrating two nascent efforts to develop such a resource.
Heidi Rehm, director of the Laboratory for Molecular Medicine at the Partners Healthcare Center for Personalized Genetic Medicine, is one of several researchers working to harmonize two clinical-grade variant databases that are still under development: MutaDataBase, an effort led by a non-profit foundation of the same name based in Belgium; and ClinVar, a new resource under development at the National Center for Biotechnology Information.
Both projects set out to tackle a challenge facing labs that have adopted next-generation sequencing as part of patient care: Once a genome is sequenced and the variants are identified, bioinformaticians must scan through dozens of databases — including OMIM, HGMD, dbSNP, DGV, and locus-specific databases — in order to functionally interpret the variant calls. Not only is this extremely labor-intensive for whole-genome analysis, but much of the information is out of date or incorrect because it is based on first references in the literature and has not been updated as new findings have been published.
Rehm, a principal investigator for MutaDataBase, became aware of ClinVar when NCBI contacted her and several other clinical lab managers to gather feedback on the new resource. As the two efforts evolved, "it became clear that part of our problem is that there are too many databases out there, and we really want just one," Rehm told BioInform. "We realized it would be silly to have MutaDataBase and ClinVar be separate projects."
As a result, Rehm and several other MutaDataBase participants are in the process of writing a grant, scheduled for submission in September that will support the development of an integrated resource that merges aspects of both projects.
The plan, she said, "is that we use MutaDataBase and [its] MutaReporter curation environment to help curate variants, but that the ultimate repository for those variants is ClinVar, which is maintained by NCBI."
A primary goal for the grant, she said, "is to bring the community together to develop standards" for classifying variants as either benign or pathogenic, as well as guidelines for how to update the database when those classifications change based on new information.
"We've been doing that to some extent with the MutaDataBase project," Rehm said. "A core set of us have been discussing these issues and coming up with different standards, but there's still more work to be done, more work to get people into the process so that we all feel some ownership of the decisions and then we'll all implement it in our respective labs."
Another aim is to convince clinical labs to deposit their data into the repositories.
"Labs like my own and many others out there have been collecting variant data and curating it in real time as we've written reports on patients for many, many years, but none of that data — or very little of that data — gets into the public domain," Rehm said.
As a result, "a lot of the focus of the MutaDataBase project as well as ClinVar is to get the clinical labs to put their data in the public domain because that's where we see the bulk of the data coming from."
Even if the grant doesn't come through, Rehm said that participants in both efforts plan to ensure the resources work together. "It will go faster if we get this grant and have the resources to do it. It will go slower if we're just gradually trying to work with NCBI," she said.
Surprisingly, many clinical labs have so far been willing to submit their data. Rehm said that even a number of commercial labs — including the Laboratory Corporation of America, Quest Diagnostics, Arup Labs, Athena Diagnostics, Correlegen, GeneDx, and Genzyme — committed to deposit their data in ClinVar, as did academic labs at Emory University, Baylor College of Medicine, the Mayo Clinic, and the University of Chicago.
The only holdouts so far, Rehm said, have been Myriad Genetics and Prevention Genetics.
"A year and a half ago … I said these labs are not going to put this data in the public domain. They see it as their knowledge and that's how they compete in the market," she recalled. Instead, she's found that "the vast majority" of labs have either already agreed to submit their data or are considering it.
"I think the largest motivator at this point is the notion that we're all headed toward whole-genome sequencing and there's no one out there that houses the data and expertise for every gene in the genome," she said. "The only way any one of us are going to be able to interpret genomes is if we have a universal mutation database that houses all the data so we can make sense of the mutations we find in any patient despite the fact that we don’t have longstanding expertise on every gene."
MutaDataBase got its start several years ago when Patrick Willems, founder and director of the Genetic Diagnostic Network, or GENDIA, an international network of more than 100 diagnostic labs, grew frustrated with the lack of standards that labs could use to classify variants.
Willems recruited Rehm and other clinical lab directors to create MutaDataBase, which comprises a centralized repository of DNA variant data as well as a set of curation software tools, called MutaReporter, that help users share information on particular disease genes.
As described in a paper on the effort published in Nature Biotechnology earlier this year, each gene in the database is curated by experts in the field, called MutaCurators, who review all information from the literature, other gene databases, and information entered by labs and clinicians. There are also community groups, called MutaCircles, centering around specific disease genes, to help ensure that information is complete and up to date.
"The whole system functions as a closed-loop, fully automated information system whereby molecular and clinical info can be both extracted and submitted, with gene-specific curators as gatekeepers reviewing all information," the paper states.
To date, MutaDataBase contains information on nearly 14,000 variants in 187 genes.
Rehm stressed that the project is still in its early stages, however. "We're still defining the rules for how to classify variants in the system and how the logistics of all this will work," she said, adding that the variants should not be considered "well curated" at this time.
As NCBI began developing the GTR, "there was this notion that they really need to have a variant database that these tests map to" in order to provide detailed information about the genes assayed, Rehm said.
However, NCBI "needed some assistance from the community in terms of how to create this database," she said. While NCBI "has been providing an amazing amount of genomic resources to the community for many, many years, they've never gotten involved in clinical curation, or clinically valid resources."
As a result, a number of representatives from US clinical laboratories "have now been meeting together [and] on conference calls with NCBI to help them think through ClinVar," she said.
NCBI plans to launch ClinVar later this year. According to information provided on its website, it will initially include variations from OMIM, GeneReviews, submissions to dbSNP/dbVar from PharmGKB, locus-specific databases, contributing testing laboratories, and UniProtKB/Swiss-Prot.
"Feedback from the community will be used to make improvements to the initial preview, including any adjustments needed to the viewer, filtering mechanisms, and user workflow integration," NCBI states on its website.
These improvements are expected to be available in early 2012, and NCBI said that it expects ClinVar will continue to evolve "as the clinical genetics community['s] needs change."
Have topics you'd like to see covered in BioInform? Contact the editor at btoner [at] genomeweb [.] com.