University of Washington researchers have designed a sequencing method using complementary duplex DNA tags attached to both strands of a DNA molecule, which they reported can yield less than one error per billion nucleotides sequenced.
The group described its "duplex sequencing" approach online in the Proceedings of the National Academy of Sciences earlier this month. The method, developed to work with Illumina sequencing, involves tagging both DNA strands with random, but complementary, double-stranded primers, which allow researchers to then match together sequenced DNA fragments and compare them. Mutations on only one strand and not the other can be discounted as errors.
The researchers reported that they have calculated a theoretical error rate for the approach of less than one artifact or error per gigabase — a dramatic improvement over the 1 percent error rate they recorded with the Illumina HiSeq 2000. "It should be feasible to identify and correct nearly all forms of sequencing errors by comparing the sequencing of individual tagged amplicons derived from one half of a double-stranded complex with those of the other half of the same molecule," the group wrote.
Testing the approach in bacteriophage and mitochondrial DNA, the researchers found that their duplex sequencing yielded mutation numbers that closely matched those they expected and fell many orders of magnitude closer to known or suspected mutation rates than conventional sequencing methods.
The group, led by UW professor Lawrence Loeb, developed the method as a way to sequence cancer cells more sensitively, Michael Schmitt, the paper's first author and a member of Loeb's lab, told In Sequence. He said the team is hoping to use the approach to test the mutator phenotype hypothesis, which proposes that the mutation rate in the early stages of tumor development is much greater than that of normal somatic cells, enabling the rapid accumulation of further mutations.
The initial goal of the new method was to identify sub-clonal mutations currently masked by the error rate of standard sequencing methods. According to Schmitt, the researchers hope to measure whether mutation rate can indicate a cancer's severity or resistance to chemotherapy.
"Right now it's hard to get below a level of about one percent of cells having a mutation just because the error rate of sequencing is around one percent," he said. "A clinically significant tumor will generally consist of at least one billion cells, and one percent of a billion is ten million, so you could have subpopulations of millions of cells you wouldn't even know were there [using standard sequencing methods.]"
As described in the PNAS paper, the duplex sequencing method uses 24-nucleotide-long tags split into two 12-nucleotide additions to each end of the DNA molecule, which the group calls a "duplex tag." The tags are incorporated into standard Illumina sequencing adapters.
After tagging strands, the researchers PCR amplified them, yielding "families" of molecules marked by a common tag. By grouping molecules that share a tag, the team could then compare sequences among them and eliminate any that didn't show at least three duplicates with at least 90 percent sharing the same sequence. This step filtered out random errors introduced during PCR or sequencing, the researchers reported.
After this, single strands could be matched to their mate by looking for complementary tags. Any base not matching perfectly from one strand to another was then discarded as an error, narrowing the mutation-count to only those that are mirrored on a complementary strand.
The researchers tested the duplex sequencing method on both human mitochondrial DNA and samples of M13mp2 DNA, a bacteriophage used as a mutational target that has a well-established mutation frequency.
"With the M13, we wanted to start with a DNA where we thought we had a pretty good idea of what the random mutation frequency should be to know if we're getting the right number or not," Schmitt said.
"By our method, it agreed perfectly within experimental error," he said. M13 has an established base substitution frequency of 3.0 x 10-6 and duplex sequencing measured a "nearly-identical" rate of 2.5 x 10-6, the researchers reported.
Using standard methods in the HiSeq, however, the error rate was 3.8 x 10-3, "more than 1,000-fold higher than the true mutation frequency of M13mp2 DNA," which means that more than 99.9 percent of the mutations identified by this approach would be incorrect.
Schmitt said the group also tested an intermediate approach, based on the Safe-SeqS method published last year by researchers at John Hopkins University (CSN 6/1/2011), against the duplex sequencing protocol.
According to Schmitt, the UW team had previously used a modified version of Safe-SeqS — using random primers attached to only a single strand of DNA — and found that it could reduce errors about 20 fold. However, it also yielded a biased spectrum of errors that were weighted toward guanine-to-thymine mutations.
"No matter what we measured we couldn't get below" 10-4, Schmitt said. "And seeing the bias, we thought it was likely due to DNA damage."
In its PNAS paper, the UW team reported that the single-stranded approach based on the Hopkins method reduced errors by approximately 99 percent relative to standard sequencing, but still yielded a 10-fold higher mutation rate than the known rate for M13 DNA.
The researchers also measured the spectrum of mutations, and found that duplex sequencing was in "excellent agreement with the literature values," while the single strand approach showed a large excess of G-A, C-T, G-T, and C-A mutations relative to the reference literature.
After inducing DNA damage using hydrogen peroxide, the group found that artifacts, or erroneous mutations, did not increase with the duplex sequencing method, but did with the single-strand approach.
Following these experiments, the team then applied duplex sequencing to human mitochondrial DNA. "We were able to get a value of 3.5 x 10-5, which was much lower than you get with normal sequencing and matches up very will with estimates people have made with indirect methods," Schmitt said.
According to Schmitt, the method adds little in terms of labor or time considerations in preparing samples for sequencing. However, because duplex sequencing requires redundant sequencing to create several duplicates of two DNA strands, throughput is decreased.
"Since you have the sequence duplicates from both strands, that decreases how much information you can get from a single sequencing run," Schmitt said. "You basically have to have six molecules sequenced to get the one duplex sequence back in the end."
"We can still get millions of high quality nucleotides from a single HiSeq lane. But the caveat is really that you need a lot more sequencing capacity," he explained. "If you want to sequence an entire human genome with this to 1000-fold depth, that's just not practical right now. It could mean thousands of runs."
Schmitt said the researchers could optimize the PCR conditions to reduce the number of duplicates being sequenced, therefore increasing throughput slightly. While the group's minimum cutoff for inclusion was three tag-sharing duplicates, the initial experiments ranged closer to 15 duplicates, "so there's room to move there," he said.
Additionally, he said that targeting smaller regions of the genome might also be a good way to use the method efficiently as the group moves toward its goal of applying duplex sequencing to investigating sub-clonal cancer mutations.
Isaac Kinde, a researcher on the Johns Hopkins team behind Safe-SeqS said he and his colleagues have also experimented with approaches that compare two strands of DNA to define true mutations.
"These strategies will definitely reduce error rates to near negligible levels," he said in an e-mail to In Sequence.
According to Schmitt, UW's center for commercialization is in conversation with potential partners about making the team's custom duplex adaptors and data analysis software commercially available, but he noted that this process is in its early stages. He declined to mention any companies by name.