By Julia Karow
To assert its position as a provider of long-read high-quality next-gen sequencing data, Roche's 454 Life Sciences plans to commercially offer Sanger-length reads for its Genome Sequencer FLX by the end of next month, based on improved sequencing chemistry and an instrument upgrade.
Applications for the long reads, which the company has talked about for more than two years and had originally planned to launch in 2010 (IS 9/29/2009), include de novo assemblies of large and complex genomes, transcriptome sequencing, sequence capture, and shotgun metagenomics.
Following testing with early-access customers, 454 plans to launch the GS FLX+ System commercially by the end of June, both as an upgrade for existing GS FLX instruments and as a new instrument. The upgrade, which will be performed at customers' sites, includes both hardware and software changes and will incur "minimal" costs, according to a company spokesperson.
The new instrument is designed for use with a new reagent kit, called the GS FLX Titanium Sequencing Kit XL+, though it will still work with existing kits. Like all 454 kits, it will produce a distribution of read lengths, with a peak or "modal" read length of 700 base pairs and the longest reads reaching "well over" 1,000 base pairs. In 400 flow cycles, each run will generate on the order of a million high-quality reads, or about 700 megabases of data, in 23 hours. The quality of the reads is "similar to that of capillary sequencing," according to the company.
For comparison, the current GS FLX, using GS FLX Titanium series reagents, produces an average read length of 400 base pairs and about 400 megabases of data in 200 cycles per run, which takes about 10 hours.
The increased output per run, as well as "sequencing economics," will decrease the cost per base by 40 to 50 percent on the GS FLX+, according to the spokesperson, who declined to provide additional pricing information.
At launch, the XL+ kit will support shotgun sequencing, transcriptome sequencing, and paired-end sequencing and will be compatible for sample multiplexing with the firm's MID adaptors.
According to 454, based on customer feedback, the extra-long reads will be "extremely valuable" for assembling large, complex, and highly polyploidy genomes, in particular in combination with a short-read platform. "The long reads span more repeat regions, resulting in assemblies with fewer, longer contigs and scaffolds," the spokesperson told In Sequence. "This is particularly beneficial when combining the extra-long reads with short reads for large genomes to provide the best quality assembly while managing project costs."
In addition, transcriptome sequencing will benefit from the long reads because they cover more exons and splice junctions and extend into untranslated regions, resulting in improved coverage of long transcripts and more accurate gene models.
Sequence capture and shotgun metagenomics projects will also improve with the longer reads, according to the company.
Several customers who have had early access to the long reads have mainly used them for de novo genome assembly projects.
Sequencing Parrots, Devils
Erich Jarvis, a professor of neurobiology at Duke University who studies vocal communication in songbirds, said that the XL+ reads have helped him assemble some regions of the parrot genome that he could not get to otherwise.
He said he first started sequencing the parrot genome two years ago with 454 technology and found that the assembly was lacking some regions he was particularly interested in, primarily promoter regions. Upon analysis, the researchers found that these missing areas were GC-rich or repetitive, and they were also unable to assemble them from Illumina sequencing data.
[ pagebreak ]
Reassembling the genome with long 454 reads, which Jarvis said have a peak of 600 to 800 base pairs and include some in excess of 1,000 base pairs, allowed them to obtain the promoter regions of interest as well as regions with a GC content of 60 to 70 percent, but not regions with more than 80 percent GC content. "I think we solved two-thirds of the problems that were giving us gaps in the genome" using the long reads, he said.
Jarvis' lab initially obtained long reads that 454 generated internally but has had a GS FLX+ system in house for over a month now.
He and some colleagues now want to include long 454 reads in several genomes that are being sequenced as part of the Genome 10K project, an international effort that aims to sequence 10,000 vertebrate species (IS 5/18/2010). China's BGI is sequencing the first 100 genomes using only Illumina sequencing, but Jarvis and his colleagues are planning to supplement these short reads for at least three genomes with long 454 reads. For the parrot genome, "we found out that that worked quite well," he said. "Even with just long reads, we are getting assemblies now that are just as good as and some statistically better than the chicken and finch genome assemblies [that used the] Sanger method."
Jarvis said he found that the genome assembly field is currently split with regards to which sequencing technology is favored. "You are either a long-read person or an Illumina short-read person," he said. This is partly for economic reasons, he noted, because 454 sequencing is more expensive. "If they really want to stay in business, they need to get somebody to lower their price," he said. Jarvis said 454 data is now about three times as expensive as Illumina data.
Stephan Schuster, a professor of biochemistry and molecular biology at Penn State University, has also received long-read data from 454, which his group has used for the de novo assembly of the Tasmanian devil genome.
Schuster told In Sequence that he sees the largest impact of the long reads in de novo assemblies, especially for plant genomes, which are often complex. "You get a very good draft assembly in one go, even from a single library," he said. "It is expensive but [enables] very fast turnaround for a project." Schuster did not mention how much more expensive 454 data is compared to other platforms.
Pacific Biosciences' new PacBio RS instrument also offers long reads, a fraction of which reach several kilobases, but Jarvis and Schuster both said that this platform does not yet compete with 454.
Because PacBio's raw error rate is currently about 15 percent and the output per run less than 50 megabases, the data "currently relies on a good assembly of other data in order to be useful," Schuster said, so he sees "little or no overlap" between long-read data from 454 and PacBio at the moment.
"If PacBio is able to enhance their accuracy with their long reads, I think they are going to have a product that's better [than 454's], but if they are not, I think 454 is going to be in the market for a while," said Jarvis. "The only thing I think that's decreasing 454's market is their price."
Have topics you'd like to see covered in In Sequence? Contact the editor at jkarow [at] genomeweb [.] com.