NEW YORK (GenomeWeb) – Researchers from Stanford University have developed a targeted sequencing approach for microsatellites that makes use of the CRISPR-Cas9 system to selectively fragment DNA.
In a study published in Nature Communications last month, the team demonstrated the ability of the technique to target and sequence more than 2,000 short tandem repeats (STRs), genomic regions that can be used for human identification. The method selectively fragments the DNA to keep the STRs intact and uses primer probes incorporated into Illumina flow cells that target the STR loci for sequencing.
Hanlee Ji, an associate professor of oncology at Stanford University, said in an interview that the group hopes to commercialize the method and has filed a patent on it, which could have applications in both forensics and oncology. While it is still early, he said, commercialization could involve licensing the technology to a company or partnering with clinical laboratories. For forensics purposes, he said, the technology would likely be developed into a product that could be run out of individual forensics laboratories.
The 13 STRs that are included in the Combined DNA Index System (CODIS) for human identification for forensic purposes are traditionally analyzed via PCR and capillary electrophoresis. Genotyping of the STRs is based on size differences, but the method has a number of limitations. For instance, PCR amplification can introduce errors, which can make it difficult to differentiate between similar alleles. In addition, CE-based genotyping is limited in the number of STRs that can be analyzed at once due to challenges of multiplexing.
In recent years, a number of groups have sought to develop next-generation sequencing-based approaches, including targeted methods that analyze STRs, SNPs, or mitochondrial DNA.
While developers of these methods have shown their potential, including their ability to offer greater resolution and higher throughput, they, too, have limitations. Two limitations the Stanford group sought to get around were random fragmentation, which can lead to STR loci being cut up; and PCR amplification, which can cause errors.
Melissa Gymrek, an assistant professor at the University of California, San Diego, whose research focuses on developing computational tools for analyzing STRs, said that genotyping STRs by NGS is hard due to the "low number of informative reads, since only reads that entirely span a repeat region are useful." While some have attempted targeted sequencing approaches, "these are either too low-throughput or introduce too many errors during the PCR amplification process," she said. However, by using CRISPR, the researchers get "rid of that problematic amplification step while maintaining high throughput," she said. It's a "clever method."
Ji said the method involves two key steps. First, multiplexed CRISPR-Cas9 is used to "identify the genomic region of interest and physically cut that out," he said. The second step involves a "targeting primer that recognizes the sequence adjacent to the microsatellite." Primer-specific target annealing then captures the sequence across the molecule without the need for PCR amplification. After paired-end sequencing, the first read spans the STR region, while the second read includes the primer sequences, which acts as an index for the specific STR.
In order to generate STR-seq assays, the researchers first focused on known STRs with documented SNPs. They then narrowed down the list further by selecting those that could be completely covered by a 150-base pair read as well as those that were located within 100 base pairs of a SNP that occurs with high frequency among different populations. Next, they designed targeting primers for each DNA strand that would encompass the STR region and the SNP.
The team designed two STR-seq assays — one that targeted 700 STRs from a set of well-characterized samples, and another that targeted 2,370 loci containing 964 known STRs and 1,406 candidate STRs. Next, they designed a set of guide RNAs that would fragment DNA either upstream or downstream of the STRs in both assays.
They validated the accuracy of the assays by comparing the first assay to CE-based genotyping on nine DNA samples. Overall, the STR-seq assay was highly concordant with CE genotyping, with more than 95 percent of the STR calls from the STR-seq assay agreeing with the CE genotypes. Discordance most often occurred when the microsatellites were longer than the read length or when the STRs had indels in their flanking regions.
Researchers also analyzed a family trio with both assays. For the first assay, they found that 98.5 percent of the genotypes were concordant with the known maternal and paternal inheritance. For the second assay, they found 96.2 percent concordance.
In addition, the researchers looked at the accuracy of the SNPs called in their two assays, finding that both were more that 95 percent concordant.
Next, they wanted to quantify whether using the CRISPR-Cas9 system to fragment the DNA offered advantages over random fragmentation, so they tested the same targeted STR assay using both methods. They found that the CRISPR-Cas9 method increased the percentage of on-target reads to 56 percent from 8.7 percent with random fragmentation. In addition, the CRISPR-Cas9 method increased the number of reads that spanned the entire STR region to 17.1 percent from 5.3 percent.
The researchers also wanted to make sure that their method could detect an individual DNA sample from a mixture by creating DNA mixtures that had decreasing fractions of the sample to be detected, from 25 percent down to .1 percent. They found that in a mixture of five samples, they could detect informative haplotypes at a fraction of .1 percent.
Gymrek added that the researchers addressed two important applications with their method: STR/SNP phasing and genotyping mixtures of samples. "Specifically designing probes in a smart way to physically link STRs and SNPs make both problems much more tractable," she said.
Ji said that his group is now using this method for two different projects. In one project, they are using the STR-seq assay, which includes all of the STRs that are part of CODIS, to study forensic samples that include DNA from multiple individuals.
Aside from forensics, he said the group is applying the technique to oncology. "Microsatellite instability is frequent in colon, stomach, uterine, and other cancers," Ji said. "And tumors with microsatellite instability tend to respond to immune checkpoint inhibitors, so we are applying this approach to tumors and generating microsatellite instability profiles with much better sensitivity and specificity than before."
In addition, Ji said, his group is working to improve the method so that it can work with degraded and smaller amounts of DNA — typical of what would be found in forensic or formalin-fixed paraffin-embedded samples.