Skip to main content
Premium Trial:

Request an Annual Quote

CIRAD, Penn State-Led Team Reports on Chocolate Genome Sequencing Effort

By a GenomeWeb staff reporter

NEW YORK (GenomeWeb News) – Members of the International Cocoa Genome Sequencing Consortium have sequenced and started analyzing the genome of the cacao plant, Theobroma cacao — work that they describe in an online publication in Nature Genetics this week.

The researchers — from France's agricultural research center CIRAD, Pennsylvania State University, and a dozen-and-a-half other institutions — sequenced the genome of a T. cacao variety known as Criollo from Belize, which was known to have a highly homozygous genome. In the process, they found genetic clues about everything from the plant's evolutionary history to the genes and gene families behind commercially important traits.

"The mapping of these gene families along the cocoa chromosomes and comparison with the genome regions involved in trait variation (QTLs) constitutes an invaluable source of candidate genes for further functional studies that aim to discover the specific genes directly involved in trait variation," CIRAD researcher Claire Lanaud, who is senior author on the paper, and co-authors wrote.

"This draft genome sequence will facilitate a better understanding of trait variation and will accelerate the genetic improvement of T. cacao through efficient marker-assisted selection and exploitation of genetic resources," they added.

The Penn State arm of the research was funded, in part, through a gift from the Hershey Corporation.

The Criollo variety and other cacao plants producing high-quality chocolate are highly valued but not widely cultivated, the researchers explained, since they tend to be particularly susceptible to disease and may have lower yields.

"Fine cocoa production is estimated to be less than five percent of the world cocoa production because of low productivity and disease susceptibility," co-lead author Mark Guiltinan, a plant science researcher at Penn State, said in a statement.

Earlier this year, a team led by researchers at Mars, Inc., the US Department of Agriculture, and IBM announced that they had finished sequencing the genome of a widely grown cacao cultivar called Matina 1-6.

For the current study, researchers used a combination of Roche 454 GS FLX and Illumina GAIIx platforms to shotgun sequence the 430 million base pair genome of the Criollo plant B97-61/B2 — sequence that they filled in with BAC-end sequences generated by the Sanger approach.

Their analysis of the genome uncovered 28,798 predicted protein-coding genes. Of these, the researchers were able to assigned 23,529 — almost 82 percent — to one of Criollo's 10 chromosomes.

Through comparisons with several other sequenced plants, the team identified 6,362 gene clusters that are shared with Arabidopsis thaliana, grape, soybean, and poplar, along with 682 T. cacao-specific gene families.

Their analyses of these coding genes turned up 84 lipid biosynthesis genes that may influence cocoa butter production, as well as genes related to chocolate flavor, aroma, color, disease resistance, and more. For instance, they noted, the genome seems to contain expansions affecting genes related to the production of flavonoid compounds involved in cocoa flavor.

"Our analysis of the Criollo genome has uncovered the genetic basis of pathways leading to the most important quality traits of chocolate — oil, flavonoid, and terpene biosynthesis," Penn State horticulturalist Siela Maximova, who was a co-author on the new paper, said in a statement. "It has also led to the discovery of hundreds of genes potentially involved in pathogen resistance, all of which can be used to accelerate the development of elite varieties of cacao in the future."

Meanwhile, their efforts to track down and characterize RNA coding sequences in the genome yielded 83 potential microRNA coding sequences — many of which seem to target transcription factors, the researchers explained, hinting that "miRNAs are major regulators of gene expression in T. cacao."

But, they noted, only about 24 percent of the newly sequenced cacao genome is comprised of transposable element sequences — far less than other comparably sized plant genomes.

"Smaller amounts of transposons than found in other plant species could lead to slower evolution of the chocolate plant," Guiltinan said in a statement, "which was shown to have a relatively simple evolutionary history in terms of genome structure."

Data from the new T. cacao sequencing project is available through EMBL, GenBank, and DNA Data Bank of Japan (DDBJ) databases.

The Scan

Y Chromosome Study Reveals Details on Timing of Human Settlement in Americas

A Y chromosome-based analysis suggests South America may have first been settled more than 18,000 years ago, according to a new PLOS One study.

New Insights Into TP53-Driven Cancer

Researchers examine in Nature how TP53 mutations arise and spark tumor development.

Mapping Single-Cell Genomic, Transcriptomic Landscapes of Colorectal Cancer

In Genome Medicine, researchers present a map of single-cell genomic and transcriptomic landscapes of primary and metastatic colorectal cancer.

Expanded Genetic Testing Uncovers Hereditary Cancer Risk in Significant Subset of Cancer Patients

In Genome Medicine, researchers found pathogenic or likely pathogenic hereditary cancer risk variants in close to 17 percent of the 17,523 patients profiled with expanded germline genetic testing.