NEW YORK (GenomeWeb) – The drop in next-generation sequencing costs has been opening new markets and applications over the last several years as researchers tackle the sequencing of ever more complex genomes, and as genome sequencing has become a commodified tool that is used in conjunction with other genomics technologies.
One area in particular that has seen a recent surge is plant genomics. Researchers are now using sequencing, gene expression analysis, phylogenetic mapping tools, and genetic engineering to figure out how to make better biofuels.
Cellulosic plants in general have the potential to make good biofuels because the plants are typically perennials and have a better biomass-to-energy ratio than, say, corn that is turned into ethanol.
"There is a lot of investment in understanding how we can use and improve [cellulosic plants] for biofuel feedstocks," Tom Juenger, a professor in the department of integrative biology at the University of Texas, Austin, told GenomeWeb.
However, these plant genomes have been a tough nut to crack: their genomes tend to be large, are often polyploid, and typically contain long stretches of repetitive sequence. Representative of both the promise and the challenges of understanding the genomes of plants to use as biofuels has been Panicum virgatum, or switchgrass.
Switchgrass holds great promise as a biofuel crop — it is native, grows in a variety of conditions, uses less water than corn, and is more efficient than corn at producing biofuel. However, the switchgrass genome has proven tricky to sequence. The genome itself is 1.4 gigabases in size and is allotetraploid — made up of genomes from two different, yet similar species. In addition, it is outbred, meaning it has to breed with another plant in order to reproduce, resulting in a very heterogeneous genome.
The US Department of Energy's Joint Genome Institute has identified switchgrass as one of its 10 "flagship" genomes due to its potential as a biofuel crop, and JGI researchers, in conjunction with Juenger's group at Texas, have led efforts to sequence its genome.
The group has a draft genome, but there is still more work to be done. "It's been my personal nightmare," joked Jeremy Schmutz in an interview with GenomeWeb, who leads the plant program at JGI and the HudsonAlpha Institute for Biotechnology in Huntsville, Alabama. The team began by using Roche's 454 GS FLX platform. The first attempt was "just a bunch of linear 454 data," Schmutz said, that was highly fragmented. In order to try and build each of the four subgenomes separately, Schmutz said, the researchers built a genetic map consisting of over 300 offspring from a cross, using BAC, fosmid ends, and Illumina paired-end sequencing. "Then we tried to collapse the subgenomes to get a single consensus," he said, resulting in version 1.
Version 1.1 of the assembly has since been uploaded to JGI's Phytozome database and consists of about 1.2 megabases of data, arranged in over 300,000 contigs. About 99,000 of those contigs are placed within the A or B subgenome. In addition, the researchers have identified over 98,000 loci containing protein-coding transcripts and over 27,000 alternatively spliced transcripts.
When the researchers started trying to build a switchgrass reference, there were no reference genomes for any perennial grass. So in parallel with the switchgrass sequencing work, they began looking for a close relative to switchgrass that was a diploid, had a simpler genome, but was still "perennial and with the same growth form and biology," Juenger said. "Panicum hallii fits that bill."
P. hallii, also called Hall's panicgrass, is diploid with a genome size of about 550 megabases, making it much easier to work with. In addition, unlike switchgrass, it can reproduce through outbreeding or asexual reproduction, making it easier to study gene/environment relationships since researchers can make plants that are genetically identical and study them under different conditions.
Hall's panicgrass is related to switchgrass and has some similar features, including having two ecotypes — upland and lowland. The lowland ecotype is more productive, producing more biomass, but the upland type is better able to withstand drought and extreme temperatures.
Interestingly, after sequencing the P. hallii genome, researchers found that it closely resembled one of switchgrass's progenitor subgenomes, although "it is clearly not the subgenome," Juenger said.
Despite the advantages of Hall's panicgrass over switchgrass, sequencing and assembling the genome has not been a simple task.
As with the switchgrass genome, Schmutz's team has taken a "kitchen sink" approach to de novo assembling the P. hallii genome. Initially, they started with the 454 instrument, and they later added paired-end 2x250 bp sequencing on Illumina's MiSeq. But they've also generated BAC and fosmid clones, tested Illumina's TruSeq Synthetic Long Reads, Pacific Biosciences' single-molecule, long-read technology, and various mapping techniques, including SNP-based mapping and maps based on population resequencing data.
Version 2 of the P. hallii assembly consists of 554 megabases of sequence data arranged in over 30,000 contigs. Over 37,000 loci contain protein-coding transcripts and over 49,000 protein-coding transcripts have been identified.
In addition, sequencing data from the P. hallii genome has been used to help assemble the switchgrass genome. Since the P. hallii genome is mostly related to one of the subgenomes of switchgrass, the data can be used to help separate sequence reads for that subgenome and to organize reads onto chromosomes.
Schmutz added that the researchers have now been able to tease apart switchgrass's two subgenomes, helped in part by the P. hallii sequence data.
They have also now generated about 25X coverage of the switchgrass genome using PacBio technology, which Schmutz said is helping to refine mapping. He is also hopeful that PacBio's new higher-throughput instrument, the Sequel, will help with the process.
The researchers are now trying to figure out the genes and regulatory regions that influence the traits of the upland and lowland ecotypes in Hall's panicgrass. An ideal biofuel crop would have the productivity and increased biomass found in the lowland ecotype, but the ability of the upland ecotype to withstand drought and larger temperature fluctuations.
To look at genetic differences between these two ecotypes, the researchers crossed upland and lowland ecotypes, made clones of the parents and resulting progeny, exposed them to different types of stressors, and then performed RNA-seq as well as expression QTL mapping.
Results from an initial experiment were published in Genome Research last month, involving data from 63 parental plants — 28 uplands and 35 lowlands — and 29 progeny.
In that initial experiment, the researchers observed gene expression divergence between the upland and lowland types that was predominantly modulated by regulatory elements. Since then, Juenger said they have performed over 400 RNA-seq and eQTL mapping experiments of the plants exposed to different environmental conditions.
While the researchers are still analyzing the data, Juenger said that so far, they've found that a "small number of locations seem to affect many morphological traits, a handful of physiological traits, and lots of expression divergence." That's important, because it means if researchers want to use transgenics to make genomic changes in order to influence traits, "a small number of mutations can result in a very different adaptive strategy," he said.
The next step is to figure out how the findings from work on Hall's panicgrass translate to switchgrass. In preliminary work studying gene expression in switchgrass, Juenger said, there appear to be many more regions that influence lowland and upland ecotypes, rather than the handful of hotspot regions that seem to be important for Hall's panicgrass. So the P. hallii findings are unlikely to translate one-to-one to switchgrass, but could at least offer clues, he said. In addition, one possibility is that researchers could use transgenics to move genes from P. hallii into switchgrass in order to introduce upland traits for better drought tolerance into a lowland plant that would still maintain the same improved biomass over the upland ecotype. However, those experiments have not yet been done.
Aside from switchgrass and Hall's panicgrass, JGI has prioritized eight other plant species for biofuel research: poplar, sorghum, Bracypodium, Chlamydomonas, soybean, foxtail, and Miscanthus. For each of these, JGI researchers, in conjunction with other academic teams, are generating de novo genome assemblies and performing functional studies to identify the genes that influence traits of interest in order to harness those plants to produce biofuels more efficiently.