At 3.5 gigabases, the sunflower genome dwarfs that of maize, mouse, and even human. While its size is certainly an issue, researchers working to sequence the sunflower genome face even larger challenges.
Speaking at the seventh annual US Department of Energy Joint Genome Institute User Meeting held in Walnut Creek, Calif., in March, the University of British Columbia's Loren Rieseberg discussed his group's efforts toward developing a reference genome for sunflower.
A sunflower genome could provide insight into plant speciation, and might help researchers breed plant cultivars with improved cellulosic biomass traits, for potential biofuel production. However, technical challenges have stumped scientists working to sequence it.
For one, the sunflower genome is big. "As far as I'm aware, there hasn't been a larger genome published to date," Rieseberg said. Beyond that, the sunflower genome is chock-full of repetitive sequences. "It's about 81 percent repetitive, and a full 58 percent of the genome [is] this one LTR retrotransposon family, and [that] has caused us a large number of problems," he said. "But it gets worse: if you actually look at the age distribution of these LTR retrotransposon insertions, the vast majority occurred very recently, within the last 1 million years. And this makes assembly a real, real challenge."
That the repetitive elements are so young has proved to be the biggest obstacle yet. "It turns out that … the age structure of retrotransposon is very predictive of how easy, of how well, a genome will assemble," Rieseberg said, based on his group's experience with sunflower.
To overcome these issues, Rieseberg and his colleagues adopted a multi-pronged approach, which they described in a July 2011 Botany paper. First, the researchers produced a high-density genetic map of the sunflower genome. They then generated a sequence-based physical map, tagged at every 5 kilobases to 6 kilobases. "And then the idea is we assemble to whole genome on shotgun-sequence data," Rieseberg said. "We align the sequence contigs across the genetic and physical map … and we then do local scaffolding, back-by-back across the genome with mate pairs." Once they have produced a fixed assembly, it'll be time to annotate.
At the User Meeting, Rieseberg said his team has thus far produced the genetic and physical maps and has sequenced 96 recombinant inbred lines to 1x depth. "We're still not satisfied with the assembly from the whole-genome shotgun sequence, so we've been developing pipelines for the local scaffolding. But we're still waiting for a really good assembly before we go forward," he said. "Our hope is that in December of this year, we'll have a good genome."
While it has been a laborious process, Rieseberg said that a genome sequence for sunflower is worthwhile. "The assembly is quite challenging, but we do think our strategy using high-density genetic and physical maps for scaffold of the sequence contigs will work," he added.