By Julia Karow
Pacific BioSciences this week revealed a number of performance specifications for its first commercial single-molecule real-time DNA sequencer, due to be released during the second half of 2010, as well as a roadmap for expected improvements to the system through 2014.
While several competing sequencing platforms are focusing on large numbers of reads and low cost per base, PacBio is initially emphasizing the comparatively long read lengths, short run time, low cost per experiment, and different applications that its single-molecule analysis system will provide.
Following the launch next year, the company projects that over the next two years, through reagent and software changes, the system's read length, throughput, and speed will increase, while the cost per base will fall. In addition, applications other than DNA sequencing will become available during that time. In 2013, PacBio plans to start testing an upgraded version of its system with an improved sensor capable of monitoring more than ten times the number of reads and faster reactions. That system will also support additional applications.
"The inherent economies and capabilities of third-gen [sequencing] will expand the current market," PacBio CEO Hugh Martin told In Sequence . "It's not necessarily going to be at the expense of, say, Illumina, but it's going to be in addition" to existing second-generation sequencing systems.
PacBio plans to formally announce specs for its commercial system at the Advances in Genome Biology and Technology conference in February, but provided some numbers for In Sequence this week. The first version will run arrays, or "sets," with 80,000 zero-mode waveguides each, tiny reaction chambers in which single DNA molecules are synthesized.
This represents an almost 30-fold increase over the firm's current prototype, which has 3,000 zero-mode waveguides per chip, according to Martin. At the moment, about a third of the ZMWs can be loaded with one active DNA polymerase enzyme each, which can be used to obtain DNA sequence data.
A so-called SMRT cell, filled with a batch of sequencing reagents, allows users to analyze either one or two zero-mode waveguide sets, both loaded with the same DNA library. A SMRT cell with reagents will have a list price of $99.
Running two sets instead of one per SMRT cell increases the run time, since they are analyzed sequentially, but requires no additional reagents, so the cost per base is cut in half. Future versions of the system will be able to run even more sets per cell. "Our pricing strategy, at this time, is that we are going to probably fix that $100 per-run cost," Martin said. "Over time, that price is going to remain the same, but the amount of sequence that you will get for that $100 will go up tremendously."
The minimum run time is between 10 and 15 minutes, which users can adjust, depending on whether they want to maximize their throughput or read length.
An entire experiment, "from the time that you start sample prep to when you have your data," can be completed in less than 12 hours, Martin said, adding that the company plans to cut this time to four to five hours with optimizations. Sample prep costs "will be comparable or lower than with other technologies," he said, and the company has an ongoing collaboration to automate its sample prep protocols.
Users will also be able to load the instrument with batches of up to 96 SMRT cells, which the instrument can run unattended. Each of these cells can run a different protocol, such as standard sequencing, circular consensus sequencing (see In Sequence 10/14/2008), where a circular substrate is sequenced several times over; or strobe sequencing, where each read is broken up into pieces with "dark" intervals (see In Sequence 5/12/2009).
[ pagebreak ]
The order of cells is determined by a built-in scheduler that optimizes the throughput. Sample prep can be "easily" multiplexed for each protocol, according to Martin.
The average read length will initially be "at or beyond" that of the 700 to 1,000 or so bases of Sanger sequencing, "and will continue to increase rapidly over time," according to Martin.
Initially, the system's DNA polymerase will add nucleotides at a speed of between one and three bases per second, which the instrument will record in real time. "This is probably the single biggest discriminator between second-gen and third-gen sequencing — the base-to-base speed," Martin said, noting that existing sequencing platforms require lengthy wash cycles between each base incorporation.
The system's software pipeline has two components. The primary analysis, which is performed in real time on a compute module that comes with the instrument, provides base calls with quality values and a vector that represents temporal and contextual information around the base call. "We have designed significant headroom in that compute infrastructure so that over time, we can dramatically increase the throughput of the machine and continually meet the objective of data in real time," Martin said.
The company will encourage third-party developers to work with the data and develop tools that improve the initial output. To that end, it will organize a developer conference at next year's AGBT meeting, where it will share information about its software and "make sure the developers understand the various opportunities and how they can work with us to build innovation on top of the platform," Martin said
"I think there are a lot of opportunities for third parties to come in and build tools that will change different characteristics of our products" — for example to increase the accuracy of the data — "and we want to encourage that," he added,.
The secondary analysis — assembling or mapping the data — is performed off the instrument. PacBio will not sell the required computing equipment, all "standard off-the-shelf blade servers and disk arrays," according to Martin, but will provide recommendations and requirements.
Martin declined to provide an estimate of the throughput of the first commercial instrument yet, saying that it is "dependent on a number of variables which are not finalized."
Assuming that a third of the zero-mode waveguides are loaded with one enzyme that remains active over a 15-minute run, the output per SMRT cell run could theoretically reach up to 72 megabases if two sets run sequentially, provided the polymerase proceeds at 3 bases per second. This would translate to a throughput of up to 140 megabases per hour.
By comparison, Illumina's Genome Analyzer provides up to 33 gigabases of high-quality data in a 9.5-day run, according to specifications on the company's website, translating to approximately 140 megabases per hour as well. The company recently said that several customers have achieved more than 55 gigabases per run and that it is targeting a throughout of 95 gigabases per run by around the end of the year (see In Sequence 11/3/2009).
[ pagebreak ]
Future Improvements
In 2011 and 2012, PacBio plans to significantly improve the system's speed, read length and output. Martin said this will require no upgrades of the instrument's hardware, which has been designed to support these performance increases. For example, the sensor system and the data buses are capable of monitoring all 80,000 ZMWs simultaneously, and to support a higher enzyme speed.
The company has also designed the hardware and software "to make sure that we have very long stability, so that we can have runs that could go for hours with the accuracy and the other alignment characteristics that we need" for long read lengths, Martin said.
Instead of hardware upgrades, the company will provide on a regular basis — at least every six months — kit upgrades that improve speed, read length, and yield, as well as software upgrades.
Eventually, users will be able to run up to four sets per SMRT cell instead of two, doubling the amount of data from the same reagent kits.
The system can also support polymerases operating at up to 10 to 15 bases per second, translating to a several-fold increase in throughput. In addition, the company expects to be able to triple the number of active zero-mode waveguides to 90 percent.
With regard to read length, "we don't know yet where the upper end will be," Martin said, but it will likely be "tens of thousands of bases."
Further, the company plans to introduce additional applications, including DNA methylation detection, direct RNA sequencing, and, "most likely," analysis of RNA translation into protein. "All of those will be running on the exact same hardware you will be buying in 2010," according to Martin.
In 2013, the company plans to start beta-testing version 2 of its system, designed to monitor a chip with at least a million zero-mode waveguides, more than tenfold the initial number, and possibly to run several of these arrays sequentially. This system will also be able to support enzymes with a speed of up to 50 bases per second, and deliver read length of up to tens of thousands of bases. The commercial release of version 2, which will likely also support additional biological applications, such as monitoring protein-protein interactions, is slated for 2014, according to Martin.
The Promise of 'Third-Gen'
Even though it appears that PacBio's initial data throughput, and cost per base, will likely be no better than that of existing second-generation sequencing systems, the company expects its system and data will provide researchers with new capabilities that none of the existing platforms can offer.
For example, Illumina's Genome Analyzer and Life Tech's SOLiD system, Martin said, are "trying to out-throughput each other," and are driving the cost per base down as a consequence, while improving read length only marginally. "They are not adding any other value other than reducing the cost per base — no more information or higher quality," he claimed.
PacBio, he said, will change that. "Just like 454 changed the game completely from capillary electrophoresis, I think that PacBio is going to change how people think about sequencing," he said. In particular, the company will provide long reads, along with short sample prep and run times, and low costs per experiment.
Cancer research is one area that he said will benefit from long reads, which allow researchers to map complex rearrangements. Short run times will be "extremely important" for infectious disease applications that require a quick turnaround, he said. Also, because PacBio's per-experiment costs are lower than that of other next-gen platforms, he said he believes it will be useful for clinical diagnostics applications.
The platform will also allow for additional applications beyond DNA sequencing, such as direct methylation sequencing, direct RNA sequencing, translation analysis, and analysis of protein-protein interactions, although the initial focus will be on DNA sequencing. These applications are in part enabled by the nature of PacBio's real-time data, will provides kinetic information.
Martin declined to say how many commercial instruments PacBio currently has running in-house, citing that the company is not releasing details of its development process.
He said the firm currently has 12 prototypes, which it has used to develop its chemistry and the final hardware design, as well as for collaborations with a handful of research groups.
He added that the company has raised enough funding — most recently a $68 million round that closed this summer — to complete the commercialization of the instrument and is "absolutely on track" for shipping instruments to customers during the second half of 2010.