NEW YORK (GenomeWeb) – In a study appearing online today in Science, an international team led by investigators in the US and France described efforts to establish and start analyzing the coffee genome.
The researchers produced a high-quality draft genome sequence spanning almost 570 million bases using a combination of Sanger, Roche 454, and Illumina sequencing and DNA from Coffea canephora — a diploid plant believed to have hybridized with C. eugenioides to form the widely cultivated tetraploid coffee plant C. arabica.
The team's analysis of the genome, combined with comparisons to sequences from related plants, uncovered several gene family expansions in coffee, including a boost in genes for N-methyltransferase enzymes involved in everything from plant defense to caffeine production.
A closer look at the latter process indicated that coffee plants pump out caffeine using a different subset of N-methyltransferase enzymes than are employed for caffeine production purposes in cacao or tea plants, the study's authors explained, pointing to multiple, independent adaptations to caffeine production in flowering plants.
"It turns out that the enzymes that do the job of caffeine-making in coffee branch off from one sub-family of N-methyltransferase, whereas those that do it in tea and chocolate branch off from a totally different sub-family," co-senior author Victor Albert, a biological sciences researcher at the University of Buffalo, told GenomeWeb Daily News.
"So although they're both using the same big group of enzymes, they evolved caffeine biosynthesis completely independently from different sub-groups of N-methyltransferase genes," he added.
Using DNA from a doubled haploid C. canephora plant, the researchers produced long sequence scaffolds using Roche 454 reads and bacterial artificial chromosome end sequences generated by Sanger sequencing. To that they added short Illumina reads covering the genome to a depth of 60-fold, on average. Sequencing for the study was done at the French National Sequencing Center (Genoscope).
Together, these reads were used to put together a 568.6 million base assembly that stretched over around 80 percent of the plant's 710 million base genome. While the genome scaffolds have not yet been assembled at the chromosome level, Albert explained, the team used an available genetic map to order the scaffolds quite well.
Within this assembly, the researchers tracked down 25,574 predicted protein-coding genes, along with almost 100 microRNA coding sequences and transposable elements that made up roughly half of the genome sequence.
For their first crack at analyzing the genome, the team also took a peek at both the coffee genome's structure and its content, Albert explained.
For example, when the group retraced genome duplication and triplication events in the flowering plant tree using sequences from C. canephora and related plants sequenced previously, it verified a previously described genome triplication event in the ancestor of all of these eudicot plants.
In the lineages leading to plants such as soybean, Arabidopsis, or tomato, the researchers saw signs of whole-genome duplications, multiple whole-genome duplications, or even genome triplications — events that have been implicated in species diversification in flowering plants, Albert noted.
In contrast, their analysis indicated that the lineages leading to coffee, grape, cacao, strawberry, peach, and other eudicot plants have not experienced wholesale genome duplications or genome triplications since branching off from the ancestor of eudicots.
Because the coffee plant family is known for both species richness and fruit diversity, this finding points to the possibility of plant diversification in the absence of additional whole-genome polyploidization events, Albert said. "It makes it clear that not all massive diversifications are directly linked to whole-genome duplications or triplications."
"At least in part," he added, "[the diversification] is driven by new genes and gene functions that evolved from small-scale duplications — tandem duplications that are happening all the time … in an ongoing process."
In particular, the team detected signs of sequential tandem duplications that have expanded specific gene families in coffee plants. Among them: a family of N-methyltransferase enzymes responsible for caffeine production, plant defense, and the production of alkaloid and flavonoid enzymes.
When the researchers did similarity-based clustering of sequences from coffee, tomato, Arabidopsis, and grape plants and looked for those that were over-represented in or exclusive to the coffee plant, they found a pronounced expansion in a sub-family of N-methyltransferase enzymes used for synthesizing caffeine in coffee but not in other caffeine-producing plants such as cacao and tea.
Moreover, Albert said, "we know that all of the enzymes of the caffeine pathway started out as tandem duplicates and that those tandem duplicates are completely independent of any tandem duplication events that may have occurred in chocolate that led to its caffeine synthesis."
Albert is part of an ongoing project aimed at sequencing the C. arabica genome, along with another effort focused on re-sequencing more than a dozen related diploid species within the coffee plant genus.
"We need to look at many, many more [genomes] of other varieties and other species," he said, to "take advantage of the known diversity in coffee."
The French government provided much of the funding for the current study through grants to two of the study's senior authors: Philippe Lashermes at the French Agricultural Research Centre for International Development (CIRAD), and Patrick Wincker, a researcher affiliated with Genoscope, the National Center for Scientific Research (CNRS), and the University of Évry. Other team members brought their own, additional funding to the effort.
In an accompanying perspectives article in Science, Hebrew University of Jerusalem's Dani Zamier, who was not involved in the coffee sequencing study, noted that the next step will be to "translate these decoded genomes into new and improved tools for plant breeding."