CHARLOTTE, NC (GenomeWeb) — Laboratories with clinically validated next-gen sequencing assays are having trouble detecting certain complex variants, according to a recent interlaboratory study led by researchers at Invitae, suggesting that better standards are needed to ensure the quality of NGS-based genetic testing.
At the American College of Medical Genetics and Genomics annual meeting here yesterday, Steve Lincoln, a bioinformatician at Invitae who led the effort, presented the results of the study, which involved seven laboratories with 10 workflows, almost all involving Illumina instrumentation.
The collaborators, which besides Invitae included the National Institute of Standards and Technology, Rady Children's Institute for Genomic Medicine, Mayo Clinic, the Joint Initiative for Metrology in Biology, the University of California San Francisco, the University of Washington, the Institute of Cancer Research in the UK, and the Peter MacCallum Cancer Centre in Australia, also published their findings last November in a BioRxiv preprint.
According to Lincoln, many pathogenic variants that have been found in patient samples can be difficult to detect by NGS, including large indels, copy number variants, homopolymers, or structural variants.
Many labs do not validate their NGS assays with sufficient numbers of these types of variants, in part because patient samples that contain such variants and can serve as positive controls are hard to come by.
As an alternative, the researchers decided to work with a synthetic reference sample representing different kinds of technically challenging variants. To design what they called their "Frankencontrol," they selected 24 variants, many of them pathogenic, in seven commonly tested cancer genes —BRCA1, BRCA2, CDKN2A, MLH1, MSH2, MSH6, and PMS2 —that Invitae had previously found in clinical tests performed at its lab.
They included 17 technically challenging variants, among them small, mid-sized, and large indels; deletions in short tandem repeats; a tandem repeat expansion; homopolymer-associated variants; deletion/insertion (delin) variants; variants in a segmental duplication; a variant in a GC-rich region; single nucleotide variants near indels; an SNV where the genome and transcript references differed; and benign SNVs.
SeraCare synthesized plasmids containing these variants, which were spiked into genomic DNA from a well-characterized cell line in amounts that would make them appear to be heterozygous.
This reference sample, along with the gene names, was then provided to the participating laboratories. Of the 10 workflows they used, eight involved an Illumina sequencing platform, combined with various target capture methods; one used Illumina whole-genome sequencing; and one employed the Thermo Fisher Ion Torrent platform with AmpliSeq target amplification. All workflows used different bioinformatics pipelines, and all but two of them had been clinically validated.
All 10 workflows were able to detect "easy" SNVs and small indels, with the exception of one small indel that was missed by one lab. However, only 10 of the challenging variants were detected by all workflows, and only three workflows — including Invitae's, that had originally detected all these variants in patient samples — picked up all of the 17 challenging variants.
Many of the variants were actually present in the raw NGS data but were missed because of the bioinformatics approach, Lincoln noted, and two workflows that used Illumina-provided bioinformatics pipelines performed worse than others that used non-vendor bioinformatics.
Also, the Ion Torrent workflow failed to detect a number of variants because the AmpliSeq method was unable to amplify several alleles. Lincoln explained that AmpliSeq was optimized for formalin-fixed samples and relies on small amplicons, and each target is only covered by one primer pair, which led to the failures. In addition, the Ion Torrent platform missed the homopolymer-associated variants, a known problem for this platform, though one of these variants was also missed by three of the Illumina workflows.
The Illumina whole-genome sequencing workflow from Rady Children's Hospital performed "surprisingly well," Lincoln said, missing only three variants, despite the fact that it had the lowest coverage of all the assays.
The results suggested that clinical laboratories might overlook the issue of difficult-to-analyze variants in their NGS assay validations, Lincoln said, and the researchers called for better standard materials that labs can use. SeraCare now offers the synthetic reference samples that were utilized in the study commercially, but others may develop additional standard materials.
The next step, Lincoln said, is to expand the repertoire of "hard" variants in the standard material, for example, by adding variants that mimic mosaicism or that represent other types of CNVs.