By Julia Karow
This story was originally published March 4.
Researchers from Life Technologies said earlier this month that by reducing the size of the template-carrying beads used on the SOLiD platform, the instrument could possibly reach an output of 500 gigabases per run in the future.
Life Tech researchers have also developed 50-base/25-base paired-end reads for SOLiD 4 that make use of an engineered ligase and plan to increase the read length to 75-base reads with the SOLiD 4 hq update. They have also been working on new encoding schemes for the SOLiD platform that they say will further improve the data accuracy.
Life Technologies announced the SOLiD 4 system in late January (see In Sequence 2/2/2010). The initial system, available this quarter, will be able to generate more than 1.4 billion reads, or 100 gigabases of mappable data per run. The 4 hq upgrade package, available later this year, will triple the output to 300 gigabases per run.
During a company workshop at the Advances in Genome Biology and Technology conference this month, Kevin McKernan, Life Tech's vice president of genetic analysis R&D, said that the current version of SOLiD 4 uses 1-micrometer beads that enable an output of up to 107 gigabases per run, using 2x50-base mate-pair reads.
This goes along with improved software for bead detection and color calling, as well as a better slide chemistry that supports higher bead densities, company officials told In Sequence.
A further reduction in bead size, McKernan explained, will increase the throughput further. For example, 750-nanometer beads — to be used with SOLiD 4 hq — are expected to increase the yield per run to more than 200 gigabases with 2x50-base reads, and to more than 300 gigabases with 2x75-base reads.
Company researchers are also experimenting with 500-nanometer beads, he said, which could help generate more than 300 gigabases per run with 2x50-base mate-pair reads, and 500 gigabases with 2x75-base reads. With larger flow cells, the throughput could even go up to 850 gigabases, McKernan suggested.
"In the next few years, we are probably going to see all the platforms starting to reach the diffraction limit" of light, he said. "We are getting close to that now."
With SOLiD 4, Life Tech is also introducing paired-end reads — in addition to existing fragment reads and mate-paired reads. Paired-end reads will consist of a 50-base forward read and a 25-base reverse read, which is enabled by an engineered ligase that can proceed in the opposite direction along DNA than natural ligase. Later, Life Tech plans to introduce paired-end reads with 75-base forward and 35-base reverse reads.
Improvements in read length, coverage bias, and accuracy for SOLiD 4 have also been enabled by the company's new probes, called "total precision," or ToP, reagents, which McKernan said increase the yield from each sequencing cycle and cover certain regions of the genome better, such as GC-rich regions.
While the old "opti chemistry" probes were synthesized in batches, each of the 1,024 ToP probes are synthesized individually, allowing for better quality control and balancing of their concentration.
"With the newer chemistry, we believe we can push [the read length] to 120 base pairs," McKernan said, though it is unclear yet whether the company will actually commercialize reads of that length.
Starting with SOLiD 4 hq, Life Tech plans to increase the maximum read length to 75 bases. Part of the reason why the company has not yet introduced these longer reads, McKernan said, is that customers said they did not want them, given the additional run time and cost. However, "now, there are a few more applications that want this [read] length."
To improve the data accuracy further, Life Tech researchers have also been developing new probe chemistries with alternative encoding schemes to the current two-base encoding in "color space." He likened these schemes to banking checks, where customers write the dollar amount both numerically and in words, enabling the bank to check the two against each other.
In one such scheme, called "2+1" sequencing, McKernan explained, sequence is first generated using two-base encoding in "color space" as usual. Then, an extra primer is added and every fifth base is sequenced in "base space." As a result, "there are three-color measurements for every 5th base and two-color measurements on every base in between," he said. Combined, the data will allow for a reference-free assembly, for example, according to a company official.
An alternative coding scheme, called five-base encoding, might further increase the data accuracy.
Company researchers have also started automating the sample prep workflow, in particular the emulsion PCR, and Life Tech in January introduced the $46,000 EZ Bead system, which allows users to perform the emulsion PCR in about 8 hours.
The system consists of three units, which were on display in Life Tech's AGBT suite: a $6,000 EZ bead emulsifier, a $15,000 EZ Bead amplifier, and a $25,000 EZ Bead enricher. All three systems reduce the hands-on time by about 90 percent, to less than one hour, and the overall workflow by 80 percent, to about 8 hours, according to company officials.
In addition to improving the SOLiD platform, McKernan and his colleagues are also exploring other sequencing concepts, such as sequencing single molecules with ligase, he said, which could increase data accuracy over polymerase-based single-molecule sequencing. He stressed that this work is at an early stage but said that "it's something to think about."