By Julia Karow
Using a combination of low-coverage Sanger paired-end and high-throughput 454 single and paired-end sequencing, a team of researchers led by the University of Arizona, the Department of Energy Joint Genome Institute, and 454 Life Sciences has completed a draft of the 760-megabase cassava genome.
The assembly covers about 55 percent of the genome as well as 95 percent of known cassava genes.
Recently, UA researchers won a $1.3 million grant from the Bill & Melinda Gates Foundation to lead an international consortium that is developing a genome variation database for cassava that aims to provide breeding tools to farmers for improving the plant and its disease resistance.
The genome of the cassava, or Manihot esculenta, was sequenced in two stages: In a pilot project under JGI's Community Sequencing Program that was proposed in 2006, JGI researchers generated just under 1-fold coverage of the genome from more than 700,000 Sanger shotgun reads, using plasmid and fosmid libraries, according to Phytozome, a joint resource by JGI and the Center for Integrative Genomics at the University of California, Berkeley.
During the main project phase, which started this spring and was jointly funded with an unnamed amount by JGI and Roche's 454, the scientists generated nearly 61 million single- and paired-end reads on the 454 GS FLX Titanium platform. Most of the data were generated within eight weeks.
The 454 reads were combined with the Sanger data from the pilot project to assemble and annotate the cassava genome, which is available through Phyotozome and has been deposited in GenBank. The assembly was produced by Steve Rounsley, an associated professor in the School of Plant Sciences at UA, and 454 using the company's Newbler assembly software, starting with single-end reads that were assembled alone, and then adding Sanger and 454 paired-end reads.
According to Phytozome, the genome is one of the first publicly available plant genomes sequenced primarily using the 454 technology. Earlier this year, 454 said that in collaboration with two Malaysian companies, it has sequenced the oil palm genome using 454 technology only (see In Sequence 5/19/2009), a mixture of shotgun and BAC-pool sequencing.
The draft assembly consists of more than 11,200 scaffolds spanning 416 megabases of the estimated 760-megabases genome. Half the assembled sequence is contained in 514 scaffolds of at least 180 kilobases in length.
Approximately 95 percent of all known cassava genes are contained in the assembly, and more than 100 megabases of repetitive sequence was assembled as well. The missing parts consist of repetitive sequence that could not be assembled, according to the researchers.
"We will be pursuing further improvements in the assembly with improved assembly algorithms as they become available," they wrote on Phytozome.
Since "much of the utility of the genome sequence will come from the development of breeding tools," they wrote, "a perfect reference genome sequence is not needed" and "our sequencing strategies have been selected accordingly."
Under the Gates Foundation's grant, researchers led by Rounsley plan to "cheaply and quickly" sequence other cassava varieties that they hope will allow them to identify genes that increase the plant's resistance to the virus causing Cassava Brown Streak Disease.