Roche's 454 Life Sciences continues to develop longer reads for its Genome Sequencer FLX system and to improve its sample-prep process, multiplex identifying tags, and paired-end library protocols, according to a company official.
In addition, the company has developed an Amplicon Variant Analyzer software for "ultra-deep" amplicon sequencing — an application that it says is increasingly used for clinical sequencing projects — and has further developed its Newbler assembler in order to be able to assemble large eukaryotic genomes de novo.
Earlier this month at the Advances in Genome Biology and Technology conference in Marco Island, Fla., 454 Vice President of Research and Development Michael Egholm, during a workshop organized by Roche, presented some recent improvements the company has made to the GS FLX.
Egholm said that internally, 454 has started running the GS FLX for 300 instead of 200 sequencing cycles using the Titanium chemistry, thus increasing the average read length to around 660 base pairs, with some reads reaching more than 800 base pairs in one particular run.
Reads up to almost 540 bases in that run had quality values of at least Q20, and reads up to almost 700 bases had quality values of at least Q15, he said.
Egholm added that the longest error-free read the company has obtained to date on the GS FLX is 841 bases, but he said there is still sufficient signal available beyond this, suggesting that the read length has not yet reached its limit.
To obtain the longer reads, Egholm said the company had to modify its library-preparation protocols in order to be able to attach DNA fragments of sufficient length to the beads, and plans to share these protocols with its customers "soon."
In addition, 454 has developed two new products that simplify the front-end sample preparation. The first, which the company launched when it introduced the Titanium chemistry last fall, is a semi-automatic emulsion-breaking apparatus that decreases the "ergonomic burden" for users processing large-volume emulsions and the hands-on time required for breaking the emulsion, according to a poster 454 presented at the AGBT conference.
The second product, which Egholm said is scheduled to be available to early-access customers in the second quarter, is a "robotic enrichment module" that fits on liquid-handling robots and is designed to automate the enrichment of beads carrying amplified DNA. Egholm said 454 started testing the device internally several weeks ago and found that it requires 10 minutes to set up and runs for two hours to produce beads that are "ready for sequencing."
With regard to multiplexed sequencing, Egholm said 454 will soon increase the number of multiplex identifying tags from 12 to 96. In addition, the existing 12 MID tags can be combined on short templates, so they can be used to sequence up to 144 different samples, or genome regions, in parallel.
One application of the 454 platform that can be multiplexed — amplicon sequencing, or "ultra-deep" sequencing of PCR-amplified DNA — has "taken off" and appears to be particularly promising for clinical applications, according to Egholm. The company has also recently developed a new Amplicon Variant Analyzer software to call variants from the data, he said.
Describing an example of a potential clinical application, Henry Erlich, vice president of discovery research at Roche Molecular Systems, talked about using the GS FLX to sequence HLA genes in high throughput, avoiding the so-called phase ambiguities that Sanger sequencing produces.
HLA variants are associated with a number of diseases, he said, and HLA typing is especially important for matching donors and recipients for bone marrow transplants. A Roche spokesperson told In Sequence that the company intends to commercialize HLA typing on its GS FLX system but did not provide details.
Long-Read Sweet Spot
But 454 has also been working on what it regards as one of the sweet spots of its comparatively long-read technology: the de novo assembly of genomes.
For instance, the company is about to release a new paired-end protocol — already available to early-access customers — that allows users to generate 300-base paired-end reads with insert sizes between 3 kilobases and 20 kilobases.
These large-insert paired-end libraries are needed to assemble de novo eukaryotic genomes in particular, according to Egholm, although they have also shown to be useful to sequence and assemble bacterial genomes.
As an example, Egholm showed four bacterial genomes — of S. pneunomiae, E. coli, T. Thermophilus and C. jejuni — that 454 researchers were able to assemble into a single scaffold using a single 8-kilobase library.
At another workshop Roche held during the AGBT meeting, Jim Knight, 454's director of bioinformatics development, described how the company has been improving the Newbler assembler to enable the software to deal with more complex eukaryotic genomes. New features are currently in R&D, he said.
For a start, the company has added multi-threaded parallelization to the assembler, which now uses between 16 and 32 gigabytes of main memory on a multi-CPU shared-memory computer. This will enable the assembler to "handle, certainly, the humble genomes, the small plant genomes, and the 1- to 3-gigabase eukaryotic genomes," according to Knight.
Internally, 454 is currently using a $15,000 4-dual-core Dell computer with 32 gigabytes of memory to assemble genomes de novo, he said.
The company has also enabled the software to perform an "incremental assembly" where sequence batches are added to existing assemblies to allow for a more efficient use of memory, according to Knight.
At the request of many users, 454 has also added an "each-read-in-one-contig" output, and has developed a graph viewer with a contig graph and a scaffold graph, he said.
In a collaboration with the Broad Institute, 454 tested the new R&D assembler to put together the 42-megabase genome of the fungus Neurospora crassa. Using GS 20 shotgun reads, GS FLX shotgun and paired-end reads, and Sanger reads, the assembler generated 16 large scaffolds that covered almost the entire genome.
And in a collaboration with the Salk Institute, 454 researchers assembled the 120-megabase Arabidopsis genome, using early R&D GS FLX Titanium shotgun reads and paired-end reads from 3-, 10-, and 15-kilobase libraries. The result was an N50 scaffold size of 4.1 megabases and an overall scaffold size of 109 megabases. The R&D assembler was able to complete this task within 4 hours on the 8-processor computer using 8 gigabytes of memory, according to Knight.
He said though the software is "still a little bit in flux, we should be able to get, for these more complex genomes, very good N50 scaffold sizes with a view of the scaffold graph and one of the higher-level structures of the genome."
In order to perform de novo assemblies, the company currently recommends a combination of 8-kilobase shotgun and paired-end reads for bacterial genomes, he said. For larger genomes, it suggests customers use a combination of shotgun, 3-kilobase, 20-kilobase, and — as an option — 8-kilobase paired-end reads.
Knight recommended that users start assembling with a limited amount of sequence data and add more data to the mix as needed "to be able to stop when the assembly resolves and is no longer improving."
For bacterial artificial chromosomes, 454 recommends sequencing pools of 15 to 20 BACs with shotgun and 3-kilobase or 8-kilobase paired-end reads, according to Knight.