Part of the effort to develop the Clinical Genome Resource, or ClinGen — a National Institutes of Health-funded project to build a framework for evaluating and describing the roles that genomic variants play in disease development — involves setting standards for gathering and depositing data into public resources.
This task has been given to a team comprising researchers from Brigham and Women's Hospital, the Geisinger Health System, University of Utah, and the University of California, San Francisco. Over the next three years, they'll use an $8.25 million grant from the NIH's National Human Genome Research Institute and the National Institute for Child Health and Human Development to develop standard formats for gathering and depositing genotype and phenotype data into the National Center for Biotechnology Information's ClinVar database. They'll work with clinical laboratories and specific gene databases to obtain genomic variant and disease association data and will also develop standards for determining the potential pathogenicity and medical utility of variants.
The grant is part of a larger $25 million award spread over three research groups to collect and share detailed data about genomic variants relevant to human disease and useful for clinical practice. The funds will cover the development of standards for categorizing and curating variants based on clinical relevance, standards for collecting and formatting data stored in public resources, and informatics applications for predicting disease risk and prioritizing variants for more in-depth studies.
Heidi Rehm, director of the Laboratory for Molecular Medicine at the Partners Healthcare Center for Personalized Genetic Medicine, told BioInform her group's part in the ClinGen project involves figuring out what variant data ClinVar should hold and how laboratories should structure and submit that information. They're also working to establish appropriate terminologies for categorizing and classifying variants, for example, "do we call them pathogenic, do we say disease associated, do we say probably, possibly, likely pathogenic?" she said.
Also being developed are evidence-based methods of classifying the genes themselves with respect to disease. These criteria would make it possible to classify genes as haploinsufficient or triplosensitive with respect to copy number changes, for instance, she said.
These efforts overlap with those of a second group that is developing standards for ClinGen. That team, led by researchers at the University of North Carolina, Chapel Hill, was awarded $8.4 million over four years to standardize methods of defining the clinical validity and actionabilty of variants in diseases such as cancer, cardiovascular disease, and metabolic disorders. They're also exploring methods of making curated genetic information a part of clinical care by integrating resources like ClinVar with electronic health records so that physicians can access and use the data as needed (see related story this issue).
Rehm's team has formed a series of working groups to handle different aspects of the standards development process. These include a structural variants workgroup, a phenotyping workgroup, and one focused on IT standards. They'll use guidelines for interpreting sequence variants provided by the American College of Medical Genetics and Genomics as well as standards developed by the Health Level Seven organization. They're also working with developers from the Sequence Ontology and the Human Phenotype Ontology projects among other groups.
They'll also work on methods of structuring data so that it's both interoperable and shareable across labs and systems as well as on defining data submission formats. Efforts here will include defining the most useful data fields to include in ClinVar as well as determining standards for how data will be structured within those fields. For example, a data field for cDNA coordinates might use nomenclature developed by the Human Genome Variation Society to describe data within the field, she said. Also, a phenotype data field might require that submitted data be defined according a particular ontology or a SNOMED code, for instance.
Rehm's group has also formed the International Collaboration for Clinical Genomics to assume the responsibility of providing long-term support for clinical variant collection, curation, and sharing efforts. ICCG is a partnership between the International Standards for Cytogenomic Arrays, or ISCA, consortium and members of the sequencing community. Initially, ISCA was founded in 2007 and focused initially on standardizing and sharing structural variation data from chromosomal microarray testing. Recognizing that the sequencing community had to deal with similar variant data-sharing issues, the two groups decided to merge and expand ISCA's original focus to include sequence-level variation as well.
ICCG is intended to support data collection, curation, and sharing efforts long term, Rehm said. Its goals essentially cover the same territory as the NIH-funded efforts do. They include developing standard methods for submitting and sharing data including consistent descriptions, annotations, and clinical classifications of variants as well as facilitating the submission of genotype and phenotype information into ClinVar. It will also work on improved curation methods to ensure that variants are properly classified based on functional significance and their role in human health and disease.
Membership in the ICCG is open to both individuals and institutions. Current members include Arup Laboratories, Alberta Children's Hospital, Beth Israel Deaconess Medical Center, Boston Children's Hospital, Duke University Health System, the Wellcome Trust Sanger Institute, and many more.