NEW YORK (GenomeWeb News) - Some plants, especially angiosperms, have monstrously large genomes, posing a unique set of problems for which researchers conducting plant genomic studies have developed a number of clever solutions.
The average size of an angiosperm genome is about 6.2 gigabases, about twice the size of the human genome, Andrew Leitch, a professor of plant genetics at Queen Mary, University of London, noted, but the range of genome sizes is wide.
For example, Arabidopsis thaliana, the popular model system, has a genome that's approximately 0.12 gigabases large, but at the far end of the spectrum is the Japanese canopy plant, Paris japonica, with a massive 152 gigabase genome. When researchers in 2010 published a study that used flow cytometry to assess the size of P. japonica's genome, they speculated that it might be the largest eukaryotic genome.
Plants that have been sequenced to date tend to fall on the lower end of the genome size scale. In a review published in Nature Reviews Genetics last year, the University of Minnesota's Peter Morrell, Edward Buckler from Cornell University, and Jeffrey Ross-Ibarra at the University of California, Davis, pointed out that, among crops, cucumber, medicago, rice, and others in that smaller genome size range have been sequenced. But crops like sugarcane, barley, and wheat, with genome sizes ranging from 4 gigabases to 17.1 gigabases, generally had no published sequence. The bread wheat genome, though, has since come out in draft form.
"Wheat is [approximately 17 gigabases], so that's why that is such an ambitious project. And that's a polyploidy — that genome has three iterations of everything — and then on top of that, within the genome [there are] all these repetitive sequences that are amplifying," Leitch said
The size of these plant genomes is due not just to their ploidy, but also to rearrangements that crop up.
"One of the biggest difficulties [is] that we find rearrangements in the genome, which, at least in the species we are working with, mostly Brassica napus [rapeseed], are quite unpredictable," said Rod Snowdon, a group leader at Justus-Liebig-Universität. "We have translocations between the diploid genomes and the polyploidy, and this makes things messy. It's a problem for mapping, it's a problem for genome assembly, it's a problem for resequencing, and so on."
To tackle such large genomes, many researchers, like those sequencing the sugarcane genome, are relying on BACs, or have turned to flow cytometry to isolate chromosomes such as the wheat genome researchers did, but there are other ways to study plants with large genomes.
One such way is to avoid reconstructing the plant genome in the first place and focus instead on a selection of the genome. Leitch uses a "genome skimming" approach, in which he and his colleagues sample a small bit of the genome. Because repeats are so abundant in some plants, he said he doesn't have to look at each one individually.
"It's a bit like saying, 'I don't need to count every tree in an oak forest to know it's an oak forest,'" Leitch said. Researchers can look at a plant genome "like an ecologist would look at a field and try to extrapolate by sampling rather than actually having to roadmap absolutely everything that's there."
Then they can compare what they find to a number of other plants. "Instead of having a deep understanding, you now have a wide understanding," he added.
In particular, Leitch and his colleagues focus on the context of the genes, such as whether they fall in a heterochromatic or euchromatic region or whether they are methylated or not. Such characteristics of the genome influence the genes and the plant itself. Leitch noted that weeds, growing quickly in the cracks of pavement or taking over a lawn, cannot have large genomes — they wouldn't have time to replicate a large genome.
The properties of genes are "in large part dictated by the characteristics of the genome, which can be assessed without sequencing the entire genome," he said.
Leitch and his colleagues have applied this approach to their study of the genus Nicotania, which includes the tobacco plant, Nicotiana tabacum. Only about a third of the members of the genus are polyploids while others are not, which has allowed the researchers to pinpoint in time when polyploidy arose and, in turn, study genome evolution in the genus.
Using this sampling approach, he and his lab have examined the genomes of two polyploid Nicotania species, one of which is N. tabacum, that share a paternal lineage. From this, they found that the paternal genome is degrading faster than the maternal genome. The maternal genome, whose modern-day descendent is N. sylvestris, seems to be the most stable genome within this context, while the paternally derived DNA, from N. tomentosiformis, preferentially loses its repetitive DNA, as the researchers reported in Molecular Biology and Evolution in 2011.
Snowdon, by contrast, has turned to a genotyping-by-sequencing approach for his Brassica napus studies. "[We are trying] to automate more use of genotyping-by-sequencing approaches for mapping where we can also use average read numbers in any given region of the sequenced genome to estimate whether there has been a translocation or not," he said.
"There is possibly a deletion of one chromosome region accompanied by a duplication of its homeolog," he said. "We can use sequencing to at least recognize these rearrangements and try and make more sense of them than we could [with] normal molecular markers."
With this approach, Snowdon and his colleagues are re-synthesizing B. napus lines, tracing it back to its original diploids to uncover alleles that would be beneficial for breeding.
When polyploids form, the first generation often contains unstable pairings of chromosomes, and there are a lot of rearrangements that strongly influence traits, Snowdon said. While most of these changes are not for the better as they lead to loss of fertility for the plant or poor performance, sometimes a good phenotype arises — and those are what his team is interested in.
"What we've got left in our modern Brassica napus is a selection of what was left over from all of these crosses from original speciation, and, of course, these have been selected for positive traits, so … it's a narrow gene pool from ones that worked out," he said.
By working backward, his group is aiming to uncover the regions of the genome that can give rise to positive phenotypes that might have been lost from the modern B. napus gene pool. Then, they can use these resynthesized lines to bring new genes for resistance or heterosis into the modern breeding population.
"But it is very difficult because many of the offspring that we get from the crosses are infertile, and so on," Snowdon said. "If we can find out why that is the case and how to manipulate the genes involved, then we think we should be able to perform a much more targeted integration from related [species], or from the diploid species."
Looking forward, both researchers said that long-read sequencing technology would be helpful for the study of plants with out-sized genomes. Especially, Snowdon said, in resequencing studies when it is difficult to tell whether a translocation has really occurred.
"If we would have the possibility to accurately sequence to much, much longer reads like a PacBio approach, but without all the errors, then this could really help us to do a much better reassembly during resequencing studies," Snowdon added.
Cold Spring Harbor Laboratory's Michael Schatz has had some success in correcting PacBio reads and combining them with reads generated by another platform to produce de novo genomes. At the 2012 Plant and Animal Genome Conference, he presented his efforts developing that pipeline, which he applied to K12 Escherichia coli strain. Schatz and his colleagues also showed that their approach could work to assemble the parrot genome as well as the corn transcriptome, as they reported in Nature Biotechnology.
In addition, Moleculo, which was recently purchased by Illumina, offers a technology to help researchers get to long reads. The technology, which was developed last year by Stephen Quake, breaks genomic DNA into pieces that are tagged and then sequenced using short-read technology before being stitched back together as assembled long reads. Illumina is currently offering Moleculo sequencing as a service.
At this year's PAG, Mickey Kertesz, the CEO of Moleculo who was a postdoc in Quake's lab, said that the molecule technology could be helpful in assembling complex genomes, like those of plants. Further, Monsanto's Todd Michael said at the conference that he is using the technology to assemble the corn rootworm genome, though that work is still in progress.