COLD SPRING HARBOR, NY — Researchers at Pacific BioSciences have developed a new sequencing mode called "strobe sequencing" that enables them to increase the effective read length of their single-molecule real-time technology beyond 3 kilobases.
In a poster at last week's Biology of Genomes meeting at Cold Spring Harbor Laboratory, PacBio Chief Technology Officer Steve Turner showed that instead of generating a single, uninterrupted read of approximately 3.2 kilobases in length, by turning the lasers off and on during the run, he and his colleagues were able to double the size of the read, which consists of around 20 short "strobed" sequence reads that are interspersed with gaps.
Uninterrupted reads are limited to approximately 3,000 bases due to laser-induced photochemistry that damages the polymerases, he explained. However, when the lights are turned off, the polymerases keep incorporating nucleotides without incurring damage, and the available read length is "essentially put on hold, so you can take whatever read length you have and distribute it" over a longer stretch of DNA, he said.
Researchers could, for example, create the equivalent of a mate-pair read, with two sequences of more than a kilobase at each end and a long stretch of unidentified bases in between. Alternatively, they could cover a long piece of DNA with lots of short stitches of sequence. The length of these stitches and their distance can be varied without a need for different DNA libraries, Turner noted.
There is uncertainty regarding the size of the "dark" inserts, owing to "subtle fluctuations" in the DNA synthesis speed, he said, but it becomes smaller with longer inserts. For example, with 400-base inserts, the coefficient of variation of its size is 20 percent, but it decreases to 10 percent with 1,600 bases.
The new method can significantly extend the "footprint" of DNA covered in a single run, he said. "The only real limitation of how far you can go is the lifetime of the polymerase itself."
So far, PacBio researchers have shown they can sequence "out to thousands of bases" using strobe sequencing. Previously, they showed that inside zero-mode waveguides, where the sequencing reactions take place, the polymerase can generate reads up to 25 kilobases of DNA (see In Sequence 2/5/2009). In solution, Turner said, the enzyme can even synthesize up to 100 kilobases of DNA.
The long, gapped reads that strobe sequencing provides will be especially useful for resolving complex repetitive regions of the genome that are difficult or impossible to resolve with conventional mate pairs, he said.
Having such reads "is clearly of great value because you have linking information," in particular for de novo assemblies, said Gabor Marth, an assistant professor of biology at Boston College whose work focuses in part on the computational challenges of next-generation sequence data.
"Definitely in cases where you are using long reads as an addition to short reads, there is clear benefit from this," he said, adding that they would also be useful in projects that start with a long-read assembly.
For example, "strobe sequencing" reads could help sort out the different copies contained in long segmental duplications, he said, which can be longer than 40 kilobases and sometimes contain a "hellish, nested structure of often near-identical copies of segments."
They would not necessarily need to cover the entire segmental duplication, he noted, as long as part of the read can be aligned to unique sequence.
But besides providing benefits, "strobe sequencing" also highlights the fact that PacBio's uninterrupted sequence reads are limited in length.
According to Andy Watson, vice president for Advanced Genomics Systems at Life Technologies, it "demonstrates that the high laser power is damaging the sample." PacBio needs powerful lasers to excite and detect individual fluorescent dye molecules, he explained. The single-molecule sequencer Invitrogen is developing, on the other hand, will rely on single quantum dots, which he said require "significantly less laser power" for excitation.
Watson also pointed out that "strobe sequencing" will likely decrease the throughput of PacBio's sequencer, since the instrument can only collect actual sequence data while the lasers are turned on. The reads will also probably suffer from a high error rate since they "would likely not have gone through any repeat reading process," he said.
Strobe sequencing will be available with the first commercial release of PacBio's SMRT sequencer, scheduled for the second half of next year.
The company is currently working with outside collaborators under an early-access program, PacBio Vice President of Marketing Martha Trela told In Sequence this week, but is not disclosing details about the program or its participants.