SAN DIEGO (GenomeWeb News) – Members of the International Coffee Genome Sequencing Consortium expect to finish sequencing and assembling the coffee genome sometime this year, attendees heard at the Plant and Animal Genome conference here this week.
The team is using a strategy comparable to that used by one of the teams that sequenced the cacao genome, ICGSC member and Genoscope researcher Patrick Wincker said during a presentation in the coffee genomics workshop. Wincker and his colleagues from CIRAD, Pennsylvania State, and elsewhere reported on their cacao sequencing effort in a paper in Nature Genetics in September.
Now, the ICGS is using a combination of Roche 454, Illumina, and Sanger sequencing to tackle the roughly 710 million base diploid genome of Coffea canephora, a species behind nearly a third of all coffee production.
Specifically, the team plans to get at least 20 times coverage of the coffee genome using the Roche 454 Titanium single and paired-end reads. They then plan on complementing, correcting, and filling in the genome sequence data, using Sanger BAC end sequences and Illumina Genome Analyzer IIx reads covering the genome an additional 50 times.
This sequence data will then be fed into an in-house annotation pipeline that creates gene models from cDNA, EST, known peptide, repeat, and prediction data, Wincker explained.
The researchers also intend to use massive RNA sequencing to help resolve genes, including those that are not highly expressed or highly conserved, he noted, calling the RNA sequencing approach "the most useful tool we have integrated in our pipeline today."
The team eventually plans to anchor the assembly onto C. canephora's 11 chromosomes using a high density genetic map.
Overall, the researchers ultimately expect to generate enough sequence to cover more than 90 percent of the coffee genome, with at least 70 percent of sequence anchored to coffee chromosomes.
As of December, the researchers had generated sequence representing more than three-quarters of the estimated genome size at around 15 times coverage.
"We are already not far from what we expect to obtain," Wincker said, noting that the results to date are encouraging. "We are confident that running now more sequence … we can obtain the assemblies that we want to have."