The Department of Energy’s Joint Genome Institute, in Walnut Creek, Calif., announced last month that it would embark on a series of “microbial marathons” in which it will sequence substantial numbers of microbes in an effort to push microbial genomics forward. These genomes represent an enormous and still mostly untapped resource with huge potential payoffs for medicine, agriculture, and environmental remediation.
Dan Rokhsar, director of bioinformatics at JGI, says that as the human genome project enters its endgame, sequencing centers are moving on to other challenges. Advances in sequencing and assembly mean that the bulk of a typical microbial genome can now be sequenced in a day and a half. “The last 5 percent or so is a laborious process involving lots of hand work, but the 95 percent high quality draft sequence produced by automated methods provides a large amount of information at low cost,” he explained.
To demonstrate the concept JGI earlier this year sequenced Enterococcus faecium, an antibiotic-resistant pathogen whose prevalence in surgical wards is a source of growing worry in the hospital community. “Sequencing E. faecium got us thinking how we could use our sequencing capacity to answer a lot of interesting problems in microbial biology, and at least provide the platform for other people to use these genome sequences,” said Rokhsar.
JGI then devoted the month of October to sequencing microbial genomes en masse. Fifteen were completed, and the results were so encouraging that JGI now plans to do three of these “microbial marathons” per year. Rokhsar explains that the long-term goal is to provide information that is both detailed and comprehensive. For example, a biologist would be able to ask if a copy of a particular enzyme is present in a microbe, then scan through all microbial genomes and tabulate which microbes have it and which don’t, and finally correlate this information with the environments they live in.
Genes without functional and other information are not too useful, so a tandem analytic effort to provide the needed “annotations” is being mounted by a group at Oak Ridge National Laboratory. Frank Larimer leads a small group of annotators who do the workups using enterprise level workstations plus a 1000-processor IBM Eagle supercomputer to run three types of computationally intensive routines: Blast, Hidden Markov Model/Pfam database, and NCBI Cogs database searches.
“The basic idea is to use different tools to get different perspectives on the data,” Larimer explained. “We’re aiming ultimately at archival annotation which will be deposited in GenBank, so we’re concerned about getting things as accurate as we can, but in the draft version we recognize that the body of knowledge changes continually, sometimes in a large-scale way,” he added.
For example, a recent change in the implementation of the gene-calling routine resulted in the elimination of 400 genes from the Synechococcus genome, bringing that organism’s complement down to 2400. Larimer explained that wholesale elimination of genes is not unusual during the early stages of annotating a new organism.
Intricate microbial biology may also be turned up during the annotation process, for example possibly overlapping genes, possibly expressed via alternate frameshift reads. Distinguishing what’s a gene and what’s not can be tricky in such an environment, which is where the three perspective method shows its value. “We arbitrate amongst those calls to get the best possible choice,” said Larimer.
Beyond keeping annotations of known genes accurate and up to date lies the challenge of genes that are completely unknown. “On average 25 percent of the 2,000 to 3,000 genes in a new microbe have never been seen before.” Since perhaps only 1 percent of microbes have yet been described, a prodigious amount of annotating clearly lies ahead.
Another hurdle is commensal communities, such as the consortia of fungi and bacteria in plant root environments that comprise the interface between root and soil. Many — probably most — of these bugs can only live in association with one another and cannot be obtained in pure cultures, so new approaches to getting at their DNA will have to be developed.
There is so much territory waiting to be discovered in the microbial universe that competition between public and private sectors does not appear to be the issue it has been in the human genome. Yuri Nikolsky, vice president of business development at Integrated Genomics of Chicago, whose customers include Dow Chemical, Dow Agra, Cargill and Maxygen, said: “it’s very helpful for us to have JGI as a source of genomes. Because automated annotation depends on multiple comparisons, the more genomes we have the better.”