By Julia Karow
Researchers at Stanford University have developed a targeted sequencing method that allows them to directly capture and sequence DNA on an Illumina flow cell.
According to the developers, the workflow is easier and faster than that of existing capture methods; the approach has a high on-target rate and variant calling accuracy; and it has the potential to be scaled to the entire exome. In a proof-of-principle study published online in Nature Biotechnology this week, they used the new method, called oligonucleotide-selective sequencing, or OS-Seq, to sequence up to 344 cancer genes in a HapMap sample and a colorectal cancer tumor/normal sample pair.
Stanford has licensed the method to an undisclosed company for diagnostic use, and several firms have expressed an interest in licensing it for research use, the developers said. It is unclear, however, if Illumina plans to commercialize the method for its sequencing systems. Laura Trotter, Illumina's director of corporate marketing, told In Sequence that the company "has not made any public statements with respect to enabling targeted enrichment on its sequencing flowcells."
Hanlee Ji, an assistant professor at Stanford University School of Medicine and a senior associate director at the Stanford Genome Technology Center, whose group developed the method, said that in his experience, existing exome capture methods have "extremely complicated workflows," so he and his colleagues set out to develop "something that is more efficient and worked better."
For their method, they turned Illumina flow cells into capture devices, which at the same time serve as sequencing support structures, just as in standard Illumina sequencing. Ji said that instead of Illumina flow cells, the method could use any solid-phase support with a primer lawn made up of two kinds of primers.
While the concept of using the same support for capture and sequencing is not new — Helicos BioSciences published a study this summer that used the same principle (CSN 8/9/2011) to sequence the BRCA1 gene — this is the first time it has been used to sequence multiple genes in parallel, according to Ji.
The researchers first hybridize target-specific oligonucleotides to the flow cell primer lawn and extend the primers using DNA polymerase, resulting in target-specific primer probes attached to the solid support. These probes then capture target DNA from a single-adaptor genomic DNA library, which is fixed on the flow cell in a second DNA polymerase extension reaction. This step is followed by bridge amplification, cluster generation, and sequencing using standard Illumina protocols.
The workflow, which takes about one day, is "fundamentally different" from in-solution capture methods from Agilent or NimbleGen, Ji said, because the targets are not just captured by hybridization but are secured by extending the opposite strand via DNA polymerase.
For their proof-of-principle study, the researchers first designed 366 oligonucleotide primer probes to capture the exons of 10 cancer genes, totaling 31 kilobases. To demonstrate that the method is scalable, they also designed about 11,800 primer probes to capture the exons of 344 cancer genes, or about 960 kilobases.
For the larger experiment, they synthesized the oligos on a programmable microarray. Being able to generate large enough numbers of oligonucleotides is currently the main limitation of the method, Ji said. He and his colleagues have already increased the number of probes to 20,000 and have designed probes to capture the entire exome, "so if we have the capacity to synthesize [enough] oligonucleotides, we can do exome sequencing with this approach," he said.
In their study, using both the 10-gene and the 344-gene assay, they sequenced DNA from a HapMap Yoruban individual, conducting paired-end sequencing on the Illumina GAIIx. They also sequenced the 344 genes in a colorectal cancer tumor/normal pair. For the larger assay, they used barcoding to multiplex several samples, allowing them to run each sample on the equivalent of 1.3 lanes.
For the 344-gene assay, more than 93 percent of the reads were on target, and up to 96 percent of the exon sequence was covered with at least one read. Almost 96 percent of the primer probes resulted in at least one sequence read, and 54 percent of the primer probes had a capture yield that was within a tenfold range.
For the HapMap sample, the researchers called 985 high-quality single-nucleotide variants in the 344 genes, of which almost 96 percent had previously been reported in a whole-genome sequencing study of the same sample. Also, the assay detected about 95 percent of previously reported SNPs.
For the tumor/normal pair, they identified 871 single-nucleotide variants from the normal and 727 from the tumor sample. For the normal, 99.8 percent of these were concordant with an Affymetrix SNP 6.0 array, and for the tumor, 99.5 percent.
In the cancer sample, they also identified and validated a pathogenic nonsense mutation in a gene that is known to be mutated in many colorectal cancers.
According to Ji, the workflow of the new method is very robust and "dramatically simplified" compared to current sequence capture approaches. Except for the DNA library preparation, it all takes place on the flow cell, a closed microfluidic system, and it can be completed in one day instead of several. And because every target is covered from two directions, the variant calling accuracy is "very high."
Also, compared to RainDance's multiplex PCR approach, it does not require expensive hardware. "All that you fundamentally need for this approach is just oligonucleotides," Ji said.
Ji and his colleagues have since increased the amount of sequence they capture for each target, and have multiplexed to more than 20 samples per lane. They are also now adopting the method to the Illumina HiSeq to take advantage of the greater capacity of that instrument, and they are working on improving the amount of target sequence obtained per lane.
According to Olivier Harismendy, an assistant adjunct professor at the University of California, San Diego, who has worked with several targeted sequencing methods, OS-Seq is a "very elegant approach," though it currently suffers from some of the "imperfections" of early-stage technologies. While its specificity is great, the uniformity with which targets are covered is currently "suboptimal," compared to other methods. "Fixing this in OS-Seq will require a significant amount of work, but it is work needed to balance the cost of sequencing vs. capture," he said in an e-mail message.
Ji's group is now using the method for a variety of applications, including validating variants from whole-genome sequencing, structural variant and breakpoint identification, and targeted deep resequencing of cancer genomes. "This is becoming a base technology for all of our cancer genetics questions, and also, increasingly, we are using this for our clinical translational studies," Ji said, adding that it will likely become the default platform for most studies.
In their paper, the authors write that OS-Seq is "particularly useful for translational studies and clinical diagnostics by enabling high-throughput analysis of candidate genes and identification of clinically actionable target regions."
But according to Harismendy, there are still several hurdles to overcome to make OS-Seq useful for clinical research. One is for a company to further develop and commercialize the method. Another is that clinical researchers are currently "leaning towards faster sequencers," such as the Ion Torrent PGM and the Illumina MiSeq, he said, that will likely be used initially with PCR amplicons. It is unclear right now, he said, whether the new method will be compatible with the MiSeq flow cell, or how it could be adapted for the PGM.
Have topics you'd like to see covered in In Sequence? Contact the editor at jkarow [at] genomeweb [.] com.