Pacific Biosciences has launched new sequencing kits that increase average read lengths to more than 4,300 bases and increase throughput to between 200 and 250 megabases per SMRT cell.
The higher average read length and throughput are a result of new chemistry, including a modified enzyme, and an increased movie time from 90 minutes to 120 minutes.
The new chemistry, dubbed XL, is available to customers now, and the company is currently in the process of rolling out upgrades for its installed base.
PacBio is offering two different options for the XL kits. In one option, customers can purchase the XL binding kit and still use the C2 sequencing kit. The XL binding kit makes use of a modified phi 29 enzyme, with more mutations to it than the phi 29 enzyme used in the C2 and C1 kits, Jonas Korlach, PacBio's chief scientific officer, explained during the American Society of Human Genetics conference in San Francisco last week.
The additional mutations affect the polymerase in two different ways, Korlach said. First, they enable the enzyme to move faster, sequencing around 3 bases per second compared to around 2.5 bases per second in the C2 binding kit. Second, the polymerase is more resistant to photo damage.
The XL binding kit has a similar accuracy as the C2 chemistry — 86 percent single-pass accuracy and 99.999 percent consensus accuracy, compared to 87 percent single-pass accuracy and 99.999 percent consensus accuracy with C2.
The second option customers now have is to purchase both the XL binding kit and the XL sequencing kit. This enables reads averaging over 5 kilobase pairs, but with the drawback of lowered accuracy. Single-pass accuracy is 83 percent and consensus accuracy is 99.98 percent, Korlach said, though he noted that the company is working on optimizing software to improve this.
As a result, PacBio is only recommending this option for certain applications, like scaffolding, Korlach said.
The new kits also have an increased throughput. Running two consecutive 55 minute movies generates between 200 to 250 megabases of data per SMRT cell, Korlach said. Alternatively, customers can run one 120 minute movie, which will generate around 160 megabases of data. With the C2 chemistry, throughput was around 100 to 120 megabases running two movies, or 80 megabases running one 90 minute movie.
Korlach said that in the first half of next year, the company plans to offer hardware upgrades that will increase throughput to 500 megabases per SMRT cell. The doubled throughput is a consequence of being able to interrogate all 150,000 zero mode waveguides, Korlach explained, as opposed to 75,000.
So far, Mike Schatz's group at Cold Spring Harbor laboratory has tested the XL binding/C2 sequencing combination on the rice genome. This summer, his team generated mean reads of 3,290 bases with the maximum read length over 24 kilobase pairs using an error correction and hybrid assembly method his lab developed earlier this year (IS 1/24/2012). Raw coverage of the genome was around 10-fold, and after error correction was 6.2-fold.
This error correction method results in slightly shorter average read lengths, compared to the company's specification of 4,300 bases, which refers to average mapped read length relative to a reference genome, Korlach said.
Korlach added that another metric is to look at "base-weighted average." Using that measurement, 50 percent of the rice genome data was contained within reads longer than 4,800 bases.
Schatz's team also experimented with several assembly techniques. Using just the long reads from PacBio and error correction with the Illumina MiSeq, the team assembled the genome with an N50 of 13 kilobase pairs. This was around the same size as an assembly generated using a jumping library, Illumina sequencing, and the AllPaths assembly algorithm, he said.
When the team combined error-corrected PacBio data, Illumina fragments, Illumina 2-kilobase mates, and Illumina 5-kilobase mates, the N50 nearly doubled to 25-kilobase pairs.
"Adding the extra Illumina data was helpful primarily because the coverage of the long reads was lower than is needed to assemble on their own," Schatz explained via email.
"We are working towards doubling the coverage with PacBio long reads, and then I expect that assembling just the error-corrected PacBio long reads will lead to the best assembly," he added.
Schatz said his team is now working on optimizing sample preparation and assembly methods for the rice genome. The genome has also been sequenced using BAC-by-BAC and Sanger sequencing, which enables the team to evaluate accuracy of the reads and assembly, Schatz said. Eventually, he said, the goal is to use PacBio to "improve on our draft assembly of the very large and complex wheat genome."
Looking ahead, Korlach said PacBio researchers are working on a strategy that would enable average read lengths of more than 9 kilobase pairs and throughput of 1 gigabase per SMRT cell.
The strategy relies on being able to eliminate photo damage, which is one of the major constraints on read length, said Korlach.
Currently, read lengths are limited due to occasional interactions between the polymerase and the fluorescent dye. "Every once in a while, the fluorescent dye, which emits light to give a signal, interacts with the polymerase. That energy gets transmitted to the polymerase and zaps it, which stops the sequencing," Korlach said.
Whether the polymerase is zapped by the dye is somewhat dependent on how close the polymerase is to it. "The strategy is then if you could separate the dye from the polymerase, you could minimize that effect."
Korlach said that the company is now testing a version of the enzyme that is encased in a protecting scaffold, which has demonstrated promising results in early studies.
That chemistry is still in the research stage, he said, and the company does not have a timeline as to when it will be commercially available.