This article was originally published June 15.
The Department of Energy's Joint Genome Institute has already churned out over 40 trillion bases of sequence data this year, using a variety of next-generation sequencers for applications including de novo whole genome sequencing, RNA-seq, ChIP-seq, methylation sequencing, and even some single cell sequencing.
While the majority of its work is currently in whole genome sequencing, the institute is increasingly looking beyond just sequencing genomes to doing more functional genomics work like RNA-seq and epigenetic sequencing, as well as honing single-cell sequencing techniques.
"At some point, most important genomes will be sequenced," Len Pennacchio, JGI's deputy director of genomic technologies, told In Sequence.
Currently, the center is equipped with eight Illumina HiSeq 2000 instruments, two Pacific Biosciences RS machines, and four MiSeqs. Additionally, it has a number of decommissioned instruments, including two of Roche's 454 GS FLX machines and five Genome Analyzers.
The bulk of its sequencing is done on the HiSeq instruments. The PacBio machines are used primarily for genome finishing — in fact, one machine is devoted to that task — and epigenetic sequencing. The MiSeqs are being used for technical validation and 16S ribosomal RNA sequencing.
Already this year, JGI's sequencers have churned out around 43 terabases of sequence data, and the institute is poised to surpass its annual goal of 47 terabases.
About 30 percent of its work is done for other DOE bioenergy research institutes that are studying organisms for their energy-generating potential. Around 50 percent of the sequencing is done under its community sequencing program, an annual call for "exciting, energy-related samples," Pennacchio said. In November, it chose 41 projects out of 152 submitted for its 2012 program (IS 11/8/2011). Ten percent of JGI's sequencing throughput is for internal research, and the remaining 10 percent goes to other DOE-funded programs like the International Cooperative Biodiversity Groups program and the Low Dose Radiation Research program.
De Novo Assembly, Genome Finishing
The JGI team is using a hybrid approach to de novo whole genome sequencing. Initial sequencing is done on the HiSeq and then the PacBio's long reads are used to produce better assemblies. "One PacBio machine is devoted to finishing genomes," Pennacchio said.
He said that while Illumina has gotten better and better and is producing very high-quality draft genomes, adding in sequencing on the PacBio is complementary and helps "produce as perfect a genome as you can."
Previously, the JGI team had been using Roche's 454 GS FLX, but due to PacBio's longer reads and the fact that it is single-molecule and therefore produces less bias, the team decided to switch over, and the 454 machines have been closed out, Pennacchio said.
While JGI researchers will continue to sequence and assemble larger plant genomes with this hybrid approach, they are increasingly testing the PacBio for de novo assembly of smaller microbial and fungal genomes, Pennacchio said.
The PacBio is especially useful for microbial sequencing because microbial genomes are frequently made up of nearly 80 percent guanines and cytosines. But, because PacBio is single-molecule, it does not have the GC amplification bias seen in the Illumina machines, said Pennacchio.
Epigenetics and Single Cells
JGI is increasingly developing other sequencing applications like different types of epigenetic sequencing techniques such as methylation sequencing, ChIA-PET, and ChIP-seq, as well as single-cell sequencing techniques.
Additionally, it has begun doing a significant amount of RNA-seq experiments and gene annotation.
The JGI team has been developing epigenetic sequencing techniques on the PacBio instrument. Using the PacBio for methyl-sequencing enables the direct detection of those groups, Pennacchio explained, which can lead to a better understanding of their function in regulation.
PacBio published a proof-of-principle method of its methylation sequencing strategy in Nature Methods (IS 5/11/2010). The technique makes use of polymerase kinetics — when a methylated base is incorporated by the polymerase, there is a measurable pause in the fluorescence pulsing.
In the paper, the PacBio researchers demonstrated that in synthetic DNA they could distinguish between 5-methylcytosines, 5-hydroxymethylcytosines, and N6-methyladenines.
Rex Malmstrom, who heads the micro-scale applications group at JGI and who has been working on epigenetic sequencing applications, said that being able to directly sequence the different epigenetic modifications, instead of using a chemical conversion technique like bisulfite immunoprecipitation, is critical for studying many types of epigenetic modifications.
For instance, he said, the N6-methyladenines are not detectable by other sequencing methods. There's "no chemical conversion you can use to look at that," he said.
Other types of functional genomics work include RNA-seq experiments to "better annotate genomes," Pennacchio said. For instance, one application is to use sequencing to functionally annotate genes. Researchers will construct a library with transposons inserted systematically throughout the genome. The organism will be grown in media missing something critical for its survival. Then the insertion events are sequenced using RNA-seq.
"Wherever an insertion event is allowed, this implies that the gene is not necessary; where it's not allowed, the gene is required," Pennachio said.
In that way, in organisms such as algae, researchers can figure out which genes make it optimal for bioenergy production.
This work is frequently being done on the MiSeq, Pennachio said, because it has a quick turnaround and high throughput is not needed.
Finally, JGI has recently started working on single-cell sequencing techniques. This program is still in the early stages of development, but Pennacchio said it will be especially useful for sequencing unculturable organisms and rare species from environmental samples.
Single cell sequencing is also better than metagenomic sequencing for assembly. "With metagenomics, things have to be just right for assembly," said Pennacchio. Additionally, with individual cells, "you know all the genes are from the one bug," he added.
The technique can also be useful for plants, for instance, to look at the genome or transcriptome of just one anatomical feature, said Pennacchio.
Most of the single cell sequencing work is done on the HiSeq. Often 16S rRNA sequencing will be done on an environmental sample using the MiSeq, to "pick out who's who," and then the individual cells can be sequenced on the HiSeq.
The PacBio is not used for single cells. Due to problems with the whole genome amplification step in single cell sequencing and its tendency to introduce chimeras, the shorter reads of the HiSeq are actually more desirable, since PacBio's long reads could include the whole length of the chimera. Additionally, the higher throughput of the HiSeq helps correct for those chimeras, Pennacchio added.
Improving the single cell amplification step is another active area of R&D at JGI. The amplification step tends to introduce significant bias — amplifying some areas of the genome much more efficiently than others — so that only a fraction of the genome will be represented in the sequence data.
Pennacchio said that sequencing one cell will often yield between 30 percent and 50 percent of the genome. Sequencing a second cell, though, gives a different 30 percent to 50 percent. Experiments testing single cell sequencing on Escherichia coli have found that about five cells are needed to get the entire genome.