NEW YORK (GenomeWeb) – Members of the benchmarking team of the Global Alliance for Genomics and Health have developed methods for producing standardized performance metrics for benchmarking small germline variant calls.
The GA4GH Benchmarking team brings together participants from research institutes, technology companies, government agencies, and clinical laboratories. It includes researchers from the National Institute of Standards and Technology (NIST), Illumina, Ontario Institute for Cancer Research, DNAnexus, and others.
As explained in a Nature Biotechnology paper published yesterday, the methods that they have developed address challenges associated with standardizing metrics like recall and precision, comparing different representations of variant calls, and stratifying performance by variant type and genome context. The team has made the code used for the benchmarking available in Github.
The benchmarking framework was piloted as part of the second PrecisionFDA challenge. The so-called Truth challenge, which was conducted in 2016, offered participants an opportunity to test their variant calling pipelines on a previously uncharacterized sample, called HG002, and then publish the results on PrecisionFDA for comparison to NIST's Genome in a Bottle Consortium truth dataset. It was one of two challenges intended to help the US Food and Drug Administration better understand which questions are important for assessing the reproducibility and accuracy of NGS tests, and to obtain better benchmarking datasets for NGS test development and validation.
According to the paper, the team used various approaches to standardize the variant benchmarking process including reconciling existing methods developed to compare call sets to assess the accuracy of variant and genotype calls. Furthermore, they used common binary classification forms to represent primary performance metrics and standardized the calculation of performance metrics for easy comparison across methods. Lastly, they developed a framework for stratifying performance metrics by variant type and genome context.
The paper includes a description of available reference materials from more established sources like GIAB and Illumina's Platinum Genomes, as well as a new synthetic-diploid reference dataset created from long read assemblies of two haploid cell lines. It also provides guidance on interpreting variant benchmarking results and on the limitations of high-confidence calls and regions that can be used as "truth" sets.
To compare a set of query variants against existing truth sets using the framework, researchers need to input a truth callset in the VCF format and a set of confident regions in the BED format. Other required inputs include the query callset in the VCF format and a reference genome sequence in the FASTA file format. Optional inputs include information on stratification regions to separate variant calling performance based on genomic region or to restrict comparisons to a subset of the genome such as regions captured by targeted sequencing.
In terms of future work, the researchers note in Nature Biotechnology that there is still a need for reference materials and benchmarking tools for structural variants that account for factors such as stringencies for breakpoint matching and size predictions, the benchmarking team wrote. Similar tools will need to be developed that address the unique features of somatic variants including assessing the accuracy of variant allele frequency, the researchers wrote. Other areas for development include modifying benchmarking strategies to address changes in the linear representation of the human genome.
The team also notes that the precisionFDA challenge results "should be considered only as an initial evaluation, with the rich data set resulting from the challenge inviting further exploration."