New sequencing technologies will play a major role in the Department of Energy Joint Genome Institute’s 2009 Community Sequencing Program, announced last week, as the institute transitions from Sanger sequencing to second-generation platforms.
The institute selected 44 projects for next year’s CSP, which accounts for more than half of JGI’s sequencing activities. Sixteen projects are devoted to eukaryotes, 17 to bacteria and archaea, and 11 to metagenomes.
The projects cover a range of organisms, among them BAC sequencing for the loblolly pine genome, the genome of the greater duckweed, the genome of a colony-forming oil-producing green microalga, and a metagenomic analysis of microbes residing inside the giant Pacific shipworm.
JGI’s Production Genomics Facility in Walnut Creek, Calif., plans to generate at least 60 gigabases of sequence data under the 2009 program, compared to 21 gigabases slated to be sequenced as part of this year’s program.
Next year’s program also marks a shift further away from Sanger sequencing towards second-generation sequencing technologies: 38.5 gigabases are scheduled to be produced on Illumina’s Genome Analyzer, 14 gigabases on 454’s Genome Sequencer FLX, and 8.5 gigabases on Applied Biosystems’ 3730xl Sanger platform.
“We used to sequence all our reference genomes to 8X depth with Sanger sequencing,” said Jim Bristow, JGI’s deputy director for programs, adding that the institute is currently using all three platforms already.
The actual amount of sequencing next year might actually be on the order of 100 rather than 60 gigabases, he said, depending on the mix of platforms used. “It’s very hard to put an actual number on” it, Bristow told In Sequence last week. “In the old days, we used to say, ‘We have committed 20 gigabases of sequence to the program,’ and that was easy, because we had one platform. Now … it’s different kinds of platforms for different kinds of projects.”
For example, all de novo bacterial and archaeal genomes under the 2009 program will be sequenced primarily on the institute’s four 454 GS FLX instruments. “They won’t involve Sanger,” Bristow said.
These de novo genome projects require paired-end data, and producing such data from 8-kilobase and 20-kilobase libraries on the 454 platform “has really been a major effort for us,” he said.
In addition, JGI has “put a lot of effort” into developing an apparatus that automates the emulsion breaking step in the 454 library preparation, according to Bristow. “It used to be quite an evolved pushing and pulling on syringes, which is an ergonomic nightmare.”
“People talk about drinking from a fire hose. This is a little more like drinking from Niagara Falls.”
Whether a small amount of sequencing on Illumina’s Genome Analyzer will still be required to solve homopolymer sequences in microbial de novo genomes will depend on 454’s pending upgrades for its platform, he said, which involve new reagent kits, picotiter plates, and a new version of its assembly software.
“The new Titanium platform has much better performance” regarding homopolymers, Bristow said. “We are just beginning beta testing now, so I don’t have any [first-hand] information on that score, but the data that 454 presents are quite compelling.”
JGI will also use the 454 platform for EST sequencing projects, he said.
Also included in the 2009 CSP are a number or resequencing projects of organisms where a closely related reference genome sequence is already available. “There, the workhorse will probably be the Illumina platform, because it’s so much cheaper,” according to Bristow.
Illumina’s sequencer will also be used in gene expression tag counting projects, he added.
The institute currently owns two Illumina Genome Analyzer II systems but would like to acquire additional units if funding becomes available.
Bristow said he is currently not considering getting an Applied Biosystems SOLiD platform because it would involve large setup costs and require a separate data management system. “Having to generate a new team to carry out that chemistry, in parallel to what we are doing now, just doesn’t make sense.
“For the moment, we are committed to the Illumina platform,” he said, adding that “as the SOLiD machine gets more mature, that could change.”
The institute’s remaining 50 ABI 3730xls will mostly be used for de novo sequencing of complex eukaryotic genomes, as well as for microbial community sequencing projects. However, as reads on 454’s platform are getting longer, that system may begin to take over some of these applications as well, according to Bristow.
A year ago, JGI’s Production Genomics Facility retired its 36 MegaBace 4500 Sanger sequencers (see In Sequence 3/25/2008), and within the last few months, it decommissioned approximately 20 of its 3730s. “We have taken them out of service because they are so much more expensive to run,” Bristow said.
The recent shift in technology is also apparent from JGI statistics: In the first fiscal quarter of 2008, which started last October, the institute produced 4 gigabases of Sanger data and 3.2 gigabases of 454 data, while in the second fiscal quarter, it generated only 2.3 gigabases of Sanger data and 6.9 gigabases on the 454 platform. JGI is now also producing Illumina sequence data, according to Bristow.
The CSP will make up between 50 and 60 percent of JGI’s overall sequencing activities next year. The program, which was created in 2004, takes on sequencing projects suggested by the scientific community and chosen by a panel of experts. For the 2009 round, JGI received almost 150 proposals.
The remainder of JGI’s sequencing capacity will be split between pending projects under DOE’s microbial sequencing project, which was absorbed into the CSP last year; JGI’s Genomic Encyclopedia for Bacteria and Archaea pilot project (see In Sequence 4/24/2007); and projects for the three DOE Bioenergy Research Centers, which were announced a year ago (see In Sequence 7/3/2007). Over the next few years, bioenergy research center projects will likely grow to about half of JGI’s sequencing activity, according to Bristow.
Approximately 45 percent of JGI’s $52 million annual budget is devoted to sequencing, including labor and materials, according to a JGI spokesperson.
About 20 percent goes towards informatics, but that fraction will likely grow. “Over the next few years, the analysis part of what we are doing is going to become a bigger and bigger part of what we do,” Bristow said. “Sequencing is going to continue to get cheaper, but the analysis is going to get harder.”
In anticipation of growing amounts of sequence data and computational requirements, JGI is currently replacing its compute cluster, and has made “major investments” in computing storage, he said, though he did not elaborate on JGI’s IT infrastructure.
“People talk about drinking from a fire hose,” said Bristow. ”This is a little more like drinking from Niagara Falls.”