SAN FRANCISCO (GenomeWeb) – A collection of stakeholders have created recommendations for what to include in clinical next-generation sequencing variant files, in an attempt to address the lack of standards around such data.
The workgroup —put together by the US Centers for Disease Control and Prevention and comprising representatives from clinical laboratories, regulatory agencies, informaticians, and genomics researchers — published their recommendations last week in the Journal of Molecular Diagnostics.
For years now, the CDC has been tackling the issue of standardization for clinical sequencing, and an initial workgroup in 2012 came up with general guidance for clinical labs developing NGS-based tests. In 2015, a CDC-led workgroup developed guidelines focused specifically on the informatics pipelines for clinical NGS tests.
The new guidelines are specific to the actual variant file. Ira Lubin, lead author of the study and acting branch chief of the CDC's Laboratory Research and Evaluation Branch, said in an interview that when the group was working on the recommendations for the informatics pipeline, it became clear that there was a separate need for guidance around the variant file itself.
The group "identified the problem of inconsistencies among variant files used in clinical labs," Lubin said. Because NGS variant files were initially developed for research purposes they are flexible. "That lets researchers describe a variant in more than one way," which may be good for research, but is "bad clinically because results can't be compared among labs."
The goal of the recommendations is to enable clinical laboratories to move toward more unified practices, Lubin said. Although these recommendations are not a requirement, they could potentially be adopted by regulatory bodies. The US Food and Drug Administration has been interested in the issues of standardization and has held workshops and issued draft guidance on NGS tests.
The CDC workgroup recommendations don't address the sequencing platforms themselves, but focus on the variant file itself.
In addition, the group recommends that any reference sequence that a lab plans to use should be available in a publically accessible database to enable comparisons with the human genome reference assembly.
The group does not recommend a specific variant caller but recommends that the caller of choice be configured to include not just the identified variant, but also the reference allele, as well as no-calls. It should also include local phasing information.
The group also recommended that the human genome reference assembly be used as the standard for mapping variants.
Lubin elaborated that problems can arise when, for instance, one laboratory aligns to the human reference, while another test may analyze RNA variants and assign genomic positions based on RNA conventions. "Two laboratories may be looking at the same variant and may not know they are the same," he said.
Variants should be described according to the Human Genome Variation Society, and the Human Genome Nomenclature Committee (HGNC) descriptions should be used when specifying genes. HGNC descriptions include an identification number, symbol, and name to each gene. Although the identification numbers are often currently not included in clinical variant files, the workgroup recommended that they be included, since the numbers are unique and, unlike symbols and gene names, do not change over time.
Today, several different variant file formats are commonly used, including VCF, genomeVCF, and genome variation format. While the workgroup did not endorse a specific file format, it recommended that clinical variant files specify which format and version were used.
One area the group did not provide recommendations on was assessing data quality, which can be difficult because there are no standards for grading quality that are applicable across all laboratory methods. As such, depending on the platforms and methods used, there may be variability in sequence calls among laboratories. The authors wrote that since there is a lot of development in this area, they decided not to offer specific recommendations.
Lubin added that one goal of developing these guidelines is to "promote better uniformity" across clinical NGS tests. Standardized clinical NGS variant files will also help enable data sharing and comparisons, although there is also a need for quality control materials, he said, adding that progress is being made in this area, as well. In 2015, the National Institute of Standards and Technology's Genome in a Bottle consortium released its first DNA reference materials that labs can use to gauge the performance of their NGS tests. And the consortium is continuing to develop reference material for structural variant calling and other applications.
Lubin said that he hopes clinical labs will consider implementing the CDC workgroup's recommendations. He did not think the recommendations would be too burdensome as the group included representatives from a broad swath of stakeholders, including clinical labs.
Somewhat surprisingly, he said, there were few areas of disagreement among the group potentially due to the broad recognition in the field that standards are needed for clinical NGS.
He added that another goal of the recommendations was to keep them flexible enough so that they could be readily updated. "The technology is rapidly evolving, so we need systems to rapidly keep up with what we might be doing tomorrow, even if we're not doing it today," he said.