Skip to main content
Premium Trial:

Request an Annual Quote

Researchers Work on Data Specification, Software to Enable Variant Use in Clinical Contexts


NEW YORK (GenomeWeb) — An effort involving researchers from the University of Utah, the US Centers for Disease Control and Prevention, and other organizations seeks to develop a specification and tools for transmitting information about sequence variants into clinical contexts that builds on the standard variant call format (VCF).

The developers are actively seeking the community's input on the so-called Clinical Variant Call Format (VCFclin), and have set up a website to collect this feedback to inform the development of what they describe as a "clinical-grade variant file specification" and software to implement the new format. Specifically, according to the VCFclin website, the researchers hope to develop a data standard that enables variants to be used in clinical decision support; and software to convert data from research-grade variant reports to the clinical format. VCFclin will also support existing standards such as those used by the Human Genome Variation Society and the Logical Observation Identifiers Names and Codes database, the developers wrote.

It's essentially an extension of VCF, which is the standard format for storing variants and associated annotations from sequence data that was developed as part of the 1000 Genomes Project, according to Gabor Marth, a professor of human genetics at the University of Utah and one of the researchers involved in the VCFclin effort.  He is co-leading the development effort with Karen Eilbeck, an associate professor of biomedical informatics and an adjunct assistant professor in the human genetics department at the U of Utah's School of Medicine; and Ira Lubin, a geneticist at the CDC and facilitator of its variant file workgroup, which was formed in 2012 to come up with requirements for a clinical grade variant file format. The CDC's group also includes stakeholders from government institutions, academia, industry, and standards organizations.

Marth told GenomeWeb that VCFclin is intended to bridge an existing gap that currently limits the use of genomic data in healthcare applications. While the biomedical community has well-defined ways of calling variants in research contexts, communicating that information in an unambiguous fashion to clinicians is still a bottleneck.

One of the challenges is in the ways that VCF lets users describe variants, Marth explained. Specifically, he and others developed it to support multiple ways of describing the same variant.  A single variant, for example, depending on the biological context, can be described as two consecutive substitutions, an insertion followed by a deletion, or even as two microsatellite extensions, he said. That's problematic for clinical diagnostics labs, for example, which may have a hard time matching variants found in a patient's file to those stored in databases with differing descriptions of the same variant, he said.

Rather than adopt specific ways of reporting on variants that ignore some of the underlying biology "we are writing graph-based software that is able to look at two variant calls … and answer questions about those variants whether they are the same or not or how they are related to each other," Marth said. Basically, the software works by creating graphs of data from VCF files, with alternate alleles represented as branches; mapping information from a second VCF file unto the same graph; and then looking for relationships between alleles.

The researchers are developing additional software that will make it possible to convert clinically relevant components of VCF files into standard medical formats such as HL7 as appropriate, Marth said. The researchers plan to pilot the format and software in projects at the University of Utah including the Utah Genome Project and the university's Personalized HealthCare Initiative before making it more broadly available.

In addition, the team is also collecting feedback from the community by posing a series of questions related to the use of variants in clinical contexts. The first two of these, which was posted this past summer, have to do with the development of a set of recommendations for using genomic coordinates derived from mapping data to the reference assembly. Specifically, the VCFclin team proposed two recommendations, and asked the community to weigh in on any compliance challenges that might crop up should these be adopted. They also asked questions related to use of genomic coordinates to report variants — the diagnostics community often uses protein or transcript coordinates, which makes matching those variants to their locations in genome more challenging.

The researchers are also coming up with use cases to evaluate and assess the benefits and limitations of VCFclin for applications such as exchanging and comparing variant calls among laboratories; outsourcing variant calls for downstream analysis and interpretation; and transferring data to relevant repositories. The use cases will also aid efforts to develop a common data structure for transferring genomic data to health informatics tools such as electronic medical records, according to the site.

The VCFclin development team has applied for funding from the National Institutes of Health. To date the project has been done on a largely volunteer basis with the help of some seed funding from the University of Utah. So far, the project has the buy-in of researchers from the Global Alliance for Genomics and Health as well as the College of American Pathologists, and others in the clinical diagnostics community. "It's not just us doing it," Marth said. "We are doing it in consultation and in agreement with the people who will eventually be using this system."

Furthermore, he stressed, the goal here isn't to develop a new VCF or even to further standardize the existing format. "The tools that we are writing will allow people use the existing VCF but get their needs catered to," he said.

In related efforts, last year the NIH's National Human Genome Research Institute and National Institute of Child Health and Human Development awarded three grants to multiple institutions to handle different aspects of the development of the Clinical Genome Resource, ClinGen, which is intended to provide a framework for evaluating the roles of genetic variants in diseases and exploring methods of making this information a part of clinical care.

Two grants were awarded to one team from Baylor College of Medicine and Stanford University; and a second to a team from the University of North Carolina, Chapel Hill, Geisinger Health System, and the American College of Medical Genetics and Genomics. A third grant went to a team comprising researchers from Brigham and Women's Hospital, Geisinger, the University of California, San Francisco, and the University of Utah.