454/Roche, Illumina, and Applied Biosystems expect the performance specs of their next-generation sequencing systems to improve significantly this year, according to talks by vendors and early-access users of the upgrades at the Advances in Genome Biology and Technology meeting in Marco Island, Fla., this month.
Over the next few months, users of all three systems can expect improvements such as longer reads, new paired-end libraries, shortened run times, higher throughput, and better accuracy, resulting from upgrades in the hardware, biochemistry, and software.
Roche’s 454 Life Sciences subsidiary has been working on long paired-end “jump” libraries for its Genome Sequencer FLX that span distances greater than 15 kilobases.
These libraries are the “missing piece” in 454’s portfolio of library preps and will serve to replace fosmids, the large clones used in Sanger sequencing, according to Michael Egholm, 454’s vice president of research and development, who spoke at the AGBT meeting.
Recently, the company was able to generate an E. coli paired-end library with mate pairs spanning approximately 16.5 kilobases, and has sequenced tags of approximately 175 bases on either end.
Using data from these paired reads, in addition to single-read shotgun data and “standard” 3-kilobase paired-end reads, company researchers assembled the E. coli genome de novo into a single scaffold, an “elusive goal” that 454 has pursued “for the last couple of years,” according to Egholm.
He said the company hopes the new paired-end reads will help with the de novo assembly of other genomes. 454 plans to make the technology available to collaborators “within the next month or two,” he added.
Egholm said company is also developing new reagent kits that allow scientists to double the number of sequencing cycles per run from 100 to 200. As a result, the system can obtain “extra-long reads,” or XLRs, of more than 400 bases.
In internal R&D runs, the company has obtained read lengths of 400 to 500 bases, Egholm said. Early-access users have reported a broad distribution of read lengths, which reached “well above” 500 bases, with an average of 375 bases, he added.
The read quality is good, he reported. Using a quality-scoring algorithm developed at the Broad Institute for trimming the reads, researchers at 454 determined that reads up to 385 bases in length have a quality score of Q20, meaning 99 percent accuracy.
454 has also decreased the cycle time from 49 seconds to 35 seconds or less, so despite the increase in cycles, the total run time will increase to a maximum of 10 hours compared with the current 8 hours.
454 has also developed a new picotiter plate with 3.4 million wells, or more than twice as many as before. This new plate will initially increase the number of reads per run to 1 million from the current 400,000, but may potentially enable 2 million reads per run, according to Egholm.
As a consequence, the system will produce approximately 500 million bases per run, up from the current 100 million bases, and may eventually reach one gigabase per run, he said.
But according to Egholm, increasing the number of reads will only be possible if the company can reduce signal crosstalk between adjacent wells because the wells are closer together than before. Crosstalk, he said, is “the principle limitation of the 454 sequencing system,” and the cause of homopolymer errors, the main error type of 454’s instrument.
In order to reduce crosstalk, 454 has coated the inside of the wells with a layer of metal, leaving a small aperture at the bottom that permits light to enter. Together with chemistry improvements, this upgrade decreases crosstalk by an order of magnitude, Egholm said.
454 has also found a new way to adjust the amount of apyrase prior to each run. Apyrase is an enzyme that breaks down unused nucleoside triphosphates before a new nucleotide is added.
The new kits require no hardware changes, Egholm pointed out.
David Bentley, Illumina’s chief scientist, said at the conference that internally, the company is now obtaining 3.3 gigabases of data from a single paired-end run on its “classic” Genome Analyzer, up from 2.3 gigabases it specified last October.
Illumina is currently beta-testing hardware, biochemistry, and software improvements for its Genome Analyzer and plans to release them this spring.
Among these changes is a plan to replace the 1-megapixel CCD camera of its old system with a 4-megapixel camera that can image the same area on the slide faster and decrease cycle times.
Instead of taking images from 330 tiles per channel on the flow cell, split over 110 rows and 3 columns, the new camera images the same area in only 100 tiles, split over two columns. This decreases the cycle time from 130 minutes to about 90 minutes, and the run time by about 30 percent, from 3.2 days to 2.3 days for a single-read run.
“Speed is not only important for production and time but it does actually improve the reliability.”
“Speed is not only important for production and time but it does actually improve the reliability,” Bentley remarked. “If we can shorten the run, we can get better performance, and all these factors are interrelated and give you a better robustness.”
Illumina has also made “significant improvements to the optical path,” according to Jim Meldrim, a researcher at the Broad Institute, which is an early-access user of the upgrades. Meldrim talked about the improvements during a workshop organized by Illumina at the AGBT meeting.
“This allows for crisper images with higher resolution” as well as better illumination across the tiles on a slide, he said, and thus leads to a better read quality and more aligned reads across the area of the flow cell.
In addition, Illumina has made changes to “some of the more error-prone parts of the system,” Meldrim said, including a filter wheel and a mode scrambler.
Along with the new 4-megapixel camera, Illumina has re-designed its flow cells, which now have wider channels that cover an area approximately 40 percent bigger than the channels of the old flow cells.
As a result, the Broad researchers are obtaining approximately 1.6 gigabases of aligned sequence data from a single fragment run, and 2.3 gigabases of aligned data from a paired-end run, using 36 cycles. After optimizing the cluster density, this output could be increased, Meldrim noted.
Bentley said that Illumina is also working internally on increasing the cluster density and read length, and improving the accuracy through better protocols, reagents, and algorithms.
At the end of March, Illumina will start shipping modules for paired-end sequencing, enabling customers to sequence with insert sizes of 200 to 600 bases. Two-kilobase inserts, which Illumina has used in sequencing the genome of a HapMap individual, are in development.
ABI has been working on a number of improvements for its SOLiD platform, which will be coming online this spring. Like 454’s, these will not be accompanied by changes to the sequencer’s hardware but will instead focus on the sequencing biochemistry and workflow improvements.
In order to improve the emulsion PCR for the system, the company will replace the Emuls-O-Matic, which it currently uses to generate the emulsion, with a smaller device. Within 5 minutes, this device can generate about 10 billion PCR reactors, which produce 200 to 300 million clonally amplified beads, enough for an entire slide, according to Gina Costa, an ABI researcher who presented the improvements at the AGBT conference. This represents a “far more simplistic workflow” than before, she said. The beads generated in this process are also more uniformly amplified than before, reducing imaging time.
ABI will also improve the system’s accuracy and speed by making changes to the biochemistry. Currently, the company uses two sets of probes, one that is aimed at the fourth and fifth base in the sequence, and another one that interrogates the first and second base. In the future, scientists will only use the second set because it is more efficient to interrogate at those positions and because this increases the fidelity and shortens the time of the ligation reaction, Costa explained.
Testing the new biochemistry on three different genomes with different GC content, company researchers found that errors declined by 30 percent compared to the old biochemistry that used both probe sets.
The new emulsion maker, combined with an optimized emPCR protocol, also results in beads loaded with more template than before. This has enabled the company to reduce the cycle time and the overall run time. Runs now take only 4.5 days instead of 8.5 days for a fragment library, and 8 days instead of 12 to15 days for a mate-pair library.
The improved bead loading has also enabled the researchers to increase the number of cycles per run, and thus the read length and output per run.
In a collaborative project, ABI has achieved 50-base reads with high signal quality, according to Costa.
ABI has also developed a new protocol for generating paired-end libraries that allows researchers to sequence tags between 50 and 75 bases in length from each side, instead of 25 to 27 bases. “It gives a lot more power and more linkage to the mate pair libraries that you are generating,” Costa said. The length of the paired-end library fragments generated by this method is currently limited to 250 bases, she said. ABI has already used the new approach to sequence 3-kilobase libraries from a human genome.
Kevin McKernan, ABI’s senior director for scientific operations for high throughput discovery, reported at the meeting that ABI has generated 9 gigabases of data internally in a single paired-end run.
According to Costa, ABI has also optimized procedures for pooled amplicon sequencing, for example for long-range PCR sequencing projects, in order to prevent end bias.
Finally, ABI is testing 16 barcodes for the system that will be included in one of the library adaptors. “They are primed from a second primer, much like our mate pairs, so tag length will not be impacted from the incorporation of a barcode,” McKernan told In Sequence by e-mail.