NEW YORK (GenomeWeb) – Picking an accurate and cost-effective pipeline for analyzing sequence data is critical as sequencing becomes standard practice in research and clinical contexts. To evaluate the robustness of existing variant detection pipelines and offer insights into the sort of compute resources required to run these tools, researchers from Spain's National Centre for Genomic Analysis-Centre for Genomic Regulation (CNAG-CRG) and elsewhere recently benchmarked different combinations of read aligners and variant callers for whole-genome and whole-exome sequencing.
As explained in a recent Human Mutation paper that describes the study, the researchers tested six combinations of two read aligners (BWA-MEM and GEM3) and three variant callers (FreeBayes, the Genome Analysis Toolkit HaplotypeCaller, and SAMtools) on whole-genome and whole-exome sequencing data from the NA12878 human sample. They also reported "substantial" differences in computing costs. For example, the GEM3 alignment software, which was developed at CNAG, was four times faster than the BWA-MEM aligner, requiring just 60 central processing unit hours to complete a WGS alignment compared to nearly 300 CPU hours required by BWA-MEM.
The motivation for the study came from the CNAG-CRG researchers' participation in a number of whole-genome and whole-exome sequencing studies both internally and as part of European consortia, Sergi Beltran, a bioinformatics analysis group leader at CNAG-CRG and one of the authors on the paper, told GenomeWeb. As part of these projects, he and others have had to assess various open-source algorithms and software to try and identify an optimal combination of tools that would offer the most accurate results in the shortest amount of time.
"We need to be able to sequence very fast but we also need to be able to analyze very fast and accurately," Beltran said. "This is going to become more and more important in future … if we want to make the transition into clinics and medical diagnostics."
Previous attempts to compare combinations of informatics tools include an effort by researchers from the University of Texas, Austin and Yonsei University in Seoul. They published a paper in Scientific Reports last year that described their testing of combinations of three aligners and four variant callers, including several used in the current study, on 12 datasets from the National Institute of Standards and Technology.
However, the CNAG-CRG researchers believe that there are differences between theirs and other similar comparison studies that have been done in the past. For example, in addition to comparing the output of the different tools to the reference datasets, they also compared the performance of the combinations of tools to each other to assess how well different algorithms work together, Beltran said. Furthermore, they used both whole-genome and whole-exome sequence for the comparisons and assessed sequence data collected at different time points, something which other studies have not done, he told GenomeWeb.
The researchers also compared the performance of their pipeline combinations on different genome segments partitioned based on criteria from the Genome in a Bottle consortium, another assessment that other comparison studies have not looked at in great detail, according to Beltran.
Furthermore, "the fact that we included the benchmarking on the processing times and the actual CPU time adds something that other benchmarks haven't usually provided," Steve Laurie, a postdoctoral researcher at CNAG-CRG and one of the authors on the paper, told GenomeWeb.
In terms of the choice of tools for the comparison, Beltran said that the researchers selected BWA-MEM because of its widespread use and chose to compare it with their own in-house solution, GEM3, to see if GEM3's results were as good and how much faster it could map the reads. Similarly, the researchers used the GATK haplotype caller and SamTools for the comparison because of their widespread use, and FreeBayes because of claims about its variant calling efficiency and speed. For the study, which was done earlier this year, the researchers used the most recent iterations of the tools that were available but noted that newer versions have been released since then, so it is possible that updated versions of the tools might perform better if the comparison was repeated.
For the study, the researchers used BWA-MEM and GEM3 to map raw WGS and WES reads to the GRCh37 human reference genome and then used the resulting alignment files as inputs to three variant calling algorithms. They then compared the outputs of these variant callers to a publicly available set of reference calls from the GIAB for the NA12878 sample. They chose to focus only on SNVs and indels found in the NA12878 human sample and not to tackle things like structural variation which call for a different set of approaches and tools, Beltran said. There is also no reference dataset for structural variants though GIAB said earlier this month that it has begun working on a first set of high-quality structural variants for human genomes. That would be a reference dataset that "we could test against in the future," Laurie said.
According to some of the results reported in the paper, when the researchers compared the concordance of the pipeline combinations on the data, all the combinations of tools performed well in the 70 percent of the genome deemed reliably callable by the GIAB consortium. Here, the pipelines returned almost identical results for SNVs with some differences when the pipelines were used to analyze small insertions and deletions. In 20 percent of the genome, where variant calling is more challenging, the pipelines were still able to call a relatively high number of the same SNVs but there were significant differences across pipelines in indel calls. The final 10 percent of the genome could not be mapped so no variants were called in this region. Sensitivity and specificity numbers for SNV calling was in the 99 percent range across all pipelines, while indel calling, which was harder for all the pipelines, had sensitivity ranging from about 80 percent to 99 percent and specificity values ranging from about 78 percent to 99 percent. Those lower numbers are partly due to the technical difficulty of identifying these variants but also due to a possible bias toward a particular combination of aligner and caller in generation of the NIST reference data set, the researchers note in the paper. Tools like FreeBayes, for example, were not available when the reference dataset was created, Laurie noted. Also, though the GIAB consortium used input from fourteen different workflows there was still a bias towards BWA as an aligner and some version of GATK as a caller in the pipelines, he added.
Mapping metrics from the alignments generated by BWA-MEM and GEM3 indicated that both aligners perform similarly. Both tools were able to align over 99 percent of the reads although BWA-MEM mapped a higher proportion of reads with higher quality than GEM3 in some cases. However, the researchers reported differences in the speed of the tools. For example, the GEM3 alignment software took only six minutes to fully map the WES reads and 120 minutes to map the WGS reads. In contrast, BWA-MEM required 16 minutes to complete its WES mapping and 570 minutes to map the full genome.
For the variant callers, Free Bayes returned results the fastest, requiring just over 40 minutes to process the WES sample and well under 200 minutes for the whole genome sample. In contrast, the GATK needed nearly 300 minutes for the WES samples and over 2,000 minutes for the WGS samples. SAMtools, meanwhile, required about 200 minutes for the WES and over 3,000 minutes for the WGS.
Ultimately, these results suggest that there is an optimal pipeline for sequence analysis. The choice depends on what sort of analysis the researcher wants to do and what IT resources are available to them. However, the study's authors offered some general guidelines based on their findings. For example, researchers who want fast results would perhaps be best served by using GEM3 and FreeBayes, Beltran said. Based on their tests, that combination would be "the fastest option while still giving you very good results," particularly for SNVs.
Overall, "we hope that this [study] will show that there is still room for improvement both in terms of accuracy of variant calling and in terms of reducing the time that these tools take to do their job," Beltran said. Moreover, it points researchers to alternative algorithms for their analysis pipelines. For example, "we show [that] BWA-MEM is not the only tool available for doing good mapping," he said. "GEM3 [does] a very good job … and actually can do it much faster than BWA-MEM."