NEW YORK (GenomeWeb) – Researchers from Stanford University's Genome Technology Center have designed a targeted resequencing method that modifies the surface of an Illumina flow cell in situ to capture genomic regions of interest.
The researchers first detailed the method, known as oligonucleotide-selective sequencing, OS-seq, in a 2011 Nature Biotechnology study, but have since improved upon it, making it "automated, programmable, and streamlined for production applications," according to Hanlee Ji, an assistant professor at Stanford University School of Medicine and senior author of the study. They published the updated version this month in Nucleic Acids Research.
The team originally demonstrated that the method could sequence up to 344 cancer-related genes in a HapMap sample and a colorectal cancer tumor/normal sample pair. In the recent Nucleic Acids Research study, however, the team covered up to 1,421 genes and, by optimizing the method, increased the on-target sequencing data two-fold.
Essentially, the OS-seq method uses an Illumina flow cell to serve as both a capture device and as its normal support device in the sequencing workflow. In the updated version, the researchers also use the Illumina cBot fluidics device to automate much of the workflow.
Ji said that Illumina has not given the group feedback, and the company declined to comment on the method for this article. Stanford holds several patents surrounding the technology, and one company — Finnish startup Blueprint Genetics — has already licensed the technology for use in clinical gene panels for inherited disease.
OS-seq relies on hybridization of a genomic library to a target-specific primer probe that is located on the flow cell. Using the primer probe, a subsequent polymerase extension extends the specific genomic target, with all the steps occurring on the flow cell.
The use of the cBot further automates the process of modifying and preparing the flow cell, incorporating the primer probes, and selecting the library.
The team also constructed a bioinformatics pipeline so users could program primer probe design. The bioinformatics pipeline, which they have made available for download, here, also optimizes the placement of primer probes.
To program the flow cell for targeting, the team tweaked the XML script for the cBot to enable hybridization and extension of the target oligonucleotides onto the flow cell primer lawn and the capture of the sequencing library by overnight hybridization. The modification process also enables the extension of the captured library and standard Illumina cluster generation.
The library preparation is "significantly simpler" than standard protocols, Ji said. "There is no need to do a size exclusion" and "we are close to doing this non-amplified," he added. "Because of its simplicity, it can be automated more readily."
The researchers designed several different assays, including a 29-gene panel that targets the genes involved in Ras/Raf/MAPK signal transduction, as well as 313- and 1,421-gene panels that target the exons of other cancer-related genes. A fourth assay was designed to demonstrate that probes could be tiled to cover a large contiguous region covering both SNPs as well as non-coding regions. Assay 4 covered two regions encompassing 47 genes — a 0.2-mb interval that spanned the TIPARP gene and flanking regions and a 1.5-mb region covering a portion of chromosome 18 that is frequently deleted in colon cancer. And a fifth assay targeted novel breakpoint sequences from rearrangements in a gastric cancer, as well as the gene exons from Assay 1.
The researchers used two methods for designing the primer probes in each of the assays. For the larger panels — Assay 2, 3, and 4 — they used a programmable microarray, while for the smaller Assay 1 and 5, they used traditional column-synthesized oligonucleotides.
The team found that the updated OS-seq method improved the assays' specificity. For instance, in the original study, only around 47 percent of the 11,742 primer probes captured target sequence. By contrast, 98.1 percent, 92.6 percent, and 95.5 percent of the primer probes in Assay 2, 3, and 4 captured target sequences. In addition, the assays each consisted of significantly more primer probes — 19,532 probes in Assay 2, 90,000 probes in Assay 3, and 17,548 probes in Assay 4.
"Overall, these results demonstrated a high capture efficiency and uniformity among the different pools; this was a significant improvement over our previous efforts," the authors wrote.
To test the protocol's accuracy in variant calling, the researchers applied the assays to previously sequenced and well-characterized diploid genomes. Assay 1 covered 99 percent of the bases at 30x or greater coverage and identified 88 out of 89 previously identified SNVs. The discordant variant was present in the sequence data but had been discarded by the variant caller because it had a low mapping quality.
The researchers applied the large panels, Assay 2 and 3, to an individual that had previously been analyzed with three different exome capture methods. Compared to the exome data, SNV concordance was 96.6 percent and 97.4 percent for Assay 2 and 3, respectively.
In general, the nonconcordant variants were from target genes that have sequence motifs in other gene families, the authors reported.
Another advantage of this protocol compared to other targeted sequencing approaches, said Ji, is that assays can easily be expanded. "If you want to expand an assay, you just throw in the raw oligonucleotides," he said. "That offers flexibility of design," he said, without complicated steps. "Just mix [additional oligonucleotides] together if you want to add more features."
In the study, the team demonstrated this feature by combining Assay 1 and Assay 2 to create a pool of free oligonucleotides. In the combined assay, 68.1 percent of the mapped reads were on target for the selected genomic regions and average-fold coverage was 440x with 96 percent of the targeted bases covered more than 30 times. Of the 1,205 SNVs called, 96 percent were concordant with previous exome analysis.
Next, the team wanted to test the ability of the method to detect low-frequency variants. They created mixtures with variants present at 20 percent, 10 percent, and 5 percent frequency. At 5 percent frequency, 99.2 percent of the known 240 variants were detected when covered at least at 100x.
With Assay 4, the team sought to demonstrate the ability to sequence over a long contiguous stretch of a target region as a way of genotyping candidate variants of interest. Assay 4 consisted of a 1.5-mb region on chromosome 18 as well as a .2-mb region on chromosome 3. The researchers analyzed normal DNA samples from four individuals to test this assay. Average coverage of the 1.5-mb stretch was 68x and the average percentage of detected SNPs was 98.6 percent. The .2-mb stretch had a higher average coverage of 163x because it contained a higher density of primer probes. Concordance with a SNP chip was 97 percent. The main problem with Assay 4 was that it struggled in repetitive regions and there were 18 regions greater than .5 kb with no sequence coverage.
Finally, the researchers demonstrated that using OS-seq, assays could be designed to detect structural variations. They designed Assay 5 to target 66 putative breakpoints from rearrangements identified from the whole-genome sequencing of two matched primary and metastatic tumor sites. Using the approach, the team validated eight somatic rearrangements, including two inversions and six deletions that were present in at least one tumor of two tumor samples, but absent from the normal tissue.
Ji said his team is now using the method in population studies to look at "specific classes of germline variants." OS-seq is well-suited for such large-scale targeted studies, he said, because the "library preparation can be readily adapted to 96-well plates," enabling "thousands of libraries within the course of several weeks." He also plans to use OS-seq to sequence tumor/normal pairs and to do "clonal evolution analysis with deeper sequencing."
In addition, Ji said that further optimization of the method will eventually enable targeted single molecule sequencing, an approach he said his lab is developing.