NEW YORK (GenomeWeb) – Despite the proliferation of next-generation sequencing-based clinical tests, there are currently limited standards and methods by which to quantitatively assess the performance of a given assay. Biases introduced during amplification, library preparation, and the sequencing itself can affect assays' sensitivity, and researchers have demonstrated that there can be extensive variability between the same assays run by different laboratories.
In an effort to develop a method that can assess the performance and reliability of an assay, researchers from the Garvan Institute of Medical Research in Sydney, Australia have developed a technique that uses synthetic DNA spike-ins that can quantify an assay's ability to detect specific variations.
The researchers described their synthetic DNA standards, which they called sequins, in a study published today in Nature Methods. In an accompanying study also published in Nature Methods, the group described RNA-based sequins.
Tim Mercer, senior author of both studies and laboratory head of transcriptomic research at the Garvan Institute, told GenomeWeb that his lab had been focusing on new gene discovery and lncRNA research and wanted a control in their experiments to know whether they were really discovering new genes. "We started making spike-in controls that represented mutations," he said.
The key to the technique is that the researchers designed the synthetic molecules in the reverse direction — from 3' to 5' — so that they could be included in the sequencing experiment without interfering with the sequencing of the actual molecule of interest. In addition, Mercer said, the team also designed a synthetic reference molecule to which the spike-in reads align. The sequins can be designed to represent different variations. For instance, Mercer said, in the case where a researcher is running a cancer gene panel and wants to look for specific actionable targets, synthetic spike-ins can be designed to represent those variants. The spike-ins act as a control. If the sequencing detects the synthetic variants, then it should also be able to detect those same variants in the real sample, providing an extra layer of confidence to variant calling and ruling out the presence of a variant.
The sequin reads do not interfere with the native molecule because "it is a mirror image," so the sequin reads will only align to the synthetic genome and real reads will only align to the reference.
Marc Salit, group leader of the genome-scale measurements group at the National Institute of Standards and Technology and who spearheaded NIST's Genome in a Bottle consortium to design NGS reference materials, told GenomeWeb that the study represents "leading-edge work" that "pushes forward the portfolio of tools that will build evidence of the reliability and validity of NGS assays for clinical applications."
He said that the sequin method could be complementary to the Genome in a Bottle consortium's work. The GIAB consortium is developing reference genomes that researchers can use to validate NGS methods by seeing how well their technology matches with a "truth set." However, Salit said, those reference materials all represent healthy genomes. The sequin method is different in that spike-ins can be designed to represent disease-related variants. In addition, the Genome in a Bottle reference materials are not designed to act as internal controls every time an assay is run, but instead to help labs validate their overall methods.
In the study, the researchers first designed an 11-megabase artificial chromosome and inverted the sequence so that it read 3' to 5'. Next, they encoded a representation of common genetic variants, including 223 SNVs and 176 indels, as well as surrounding sequences.
In total, the researchers designed 36 pairs of approximately 1-kb sequins, which represented 167 homozygous and 245 heterozygous variants.
To validate these sequins, the researchers ran a paired-end NGS library, incorporating them into the library prep. Mercer said that the sequins can be treated like regular DNA in setting up the library prep and sequencing reactions, and they do not interfere with the process. The only difference, he said, is that they should not align to the human reference, but to the synthetic reference. And indeed, in the initial validation, the researchers noted that the sequins aligned to the synthetic chromosome while the human reads aligned to the human reference.
They then compared variant calling of synthetic sequin variants with variant calling of true variants sequenced as part of Illumina's Platinum Genome Project to comprehensively sequence the NA12878 genome.
At an average per-base coverage of 43x, the researchers identified 95 percent and 99 percent of synthetic and human heterozygous SNVs, respectively, and 99 percent of both synthetic and human homozygous SNVs. They also detected 95 percent and 93 percent of synthetic and human heterozygous indels, respectively, and 100 percent and 98 percent of synthetic and human homozygous indels. Three of the synthetic indels were false positives.
"This demonstrates that variants represented by sequins perform analogously to bona fide variants within the well-characterized NA12878 genome, verifying their suitability as an internal positive control set for genome sequencing," the authors wrote.
Next, the researchers wanted to test sequins' ability to act as controls for looking at somatic mutations at varying allele frequencies. By combining sequin pairs at different concentrations ranging in allele frequencies of 1:1 to 1:4,096, the team hypothesized that they could establish a reference scale for measuring allele frequency.
When they sequenced to more than 25,000-fold coverage, the researchers were able to identify all but two synthetic SNVs at the lowest concentration. However, sequencing at 100x coverage was not deep enough to detect most variants at frequencies less than 12.5 percent.
The researchers next designed sequin standards to represent structural variants, demonstrating that they could resolve synthetic deletions and inversions at 20x and 30x coverage, respectively.
"We've been focusing on mimicking complex regions of the tumor genome," Mercer said. For example, he said, the group can design a sequin that represents the HER2 gene and put it in the library at a higher concentration than expected. That gives the assay a standard by which they can base analysis of the HER2 gene in a patient sample.
Mercer said that while the two papers in Nature Methods "establish the underlying concept," the next step is to apply them and "start tackling the harder regions of the genome" that are difficult to analyze with short-read sequencing technology. In the study, the authors noted that even designing sequins for these regions was currently challenging.
Salit, too, said that one drawback to the method is that regions of the genome that are difficult to sequence are also difficult to synthesize. "Where you really want to have confidence in your sequencing results and be able to detect challenging variants in a challenging context, those are also the regions for which it is hard to make synthetic molecules" to represent, he said, though he noted that the researchers are working to solve this problem.
Other researchers have also used synthetic spike-ins for internal validation. For instance, Salit said, researchers at the Frederick National Laboratory for Cancer Research published a study in the Journal of Molecular Diagnostics earlier this year on plasmid-based controls. SeraCare also markets reference materials to run in specific clinical NGS assays.
Salit said that those plasmid-based approaches are "targeted for clinical applications here and now," to represent a specific known clinically relevant variant. The sequin approach, meantime is "more aspirational," and designed to be "more generally representative" of many different types of variants.
Salit added that he and the Genome in a Bottle Consortium would be interested in potentially working with Mercer's group "to see how we can use these materials in the Genome in a Bottle portfolio."
Mercer said that aside from continuing to develop the sequin technology, he planned to make the methods available for free to the academic community. "As a concept it can be applied in many directions," he said, including for oncology-based assays as well as metagenomics research and even immunology work to validate specific immune receptors.
The technology is not yet "mature enough to be a commercial product," he added, but if companies down the road were interested in using it, he said there could be the potential for a licensing agreement.