NEW YORK (GenomeWeb) — Researchers from the Center for Genomic Regulation in Spain and their collaborators elsewhere have sequenced the genome of the Mesoamerican common bean.
Legumes, which include beans, are the second most-grown crop in the world. Their seeds contain high amounts of protein, and the common bean, Phaseolus vulgaris, is a key source of nutrition for more than 500 million people.
As they reported in Genome Biology yesterday, CGR's Roderic Guigó and his colleagues sequenced the P. vulgaris Mesoamerican breeding line, BAT93, and analyzed its transcriptome. By comparing it to an Andean common bean line, the researchers examined the bean's evolution as well as how gene duplications and long non-coding RNAs shaped its development.
"The sequence of the bean genome, both from the Andean variety, previously sequenced, and the Mesoamerican one, will definitively contribute to identify genes involved in disease resistance, drought and salt tolerance, nitrogen fixation, formation of reproductive cells and seed quality, among others," Guigó added in a statement.
Using a hybrid approach that combined 454, SOLiD, Sanger BAC, and Illumina sequencing, Guigó and his colleagues generated a 549.6 megabase P. vulgaris BAT93 genome. Based on this and their RNA sequencing of various plant organs and developmental stages, they estimated that the plant's genome contains 30,491 protein-coding genes.
A gene family expansion specific to BAT93 corresponded to putative cellular receptors with extracellular domains, they noted. Two other expansion were functionally enriched in seed development and ubiquitin-related pathways.
By adding in RNA-seq data from 27 organs or developmental stages, the researchers found that 937 protein-coding genes have organ-specific expression. In addition, hierarchical clustering of protein-coding genes reflected tissue type and a separation between root and aerial samples.
Clustering based on lncRNA expression similarly recapitulated tissue type, but also separated pods and seeds from the other tissues. This, the authors added, suggests that lncRNAs may be important for fruit development.
By comparing gene expression during various plant development stages, Guigó and his colleagues found that more transcriptional changes, affecting both protein-coding genes and lncRNAs, occurred during the vegetative stage, as compared to the reproductive stage. For instance, they noted that more than 1,000 genes and 20 lncRNAs are differentially expressed as primary leaves are established, though that number drops to less than 120 differentially expressed genes at later leaf stages.
In addition, they found that nitrogen fixation- and metabolism-related functions are enriched during early reproductive stages, while functions related to cell fate determination, regulation of defense response, and telomere maintenance are enriched in later reproductive stages.
The researchers constructed a co-expression network based on a set of 21,560 protein coding genes and lncRNAs and found that this network was enriched for ancient genes. After drilling down into and dividing the network into inter-connected modules, the researchers found that the largest module — containing 1,271 genes, more than 39,000 edges, and an average connectivity of 50 — was enriched for more than 170 gene ontology terms, many of which were related to photosynthesis.
Guigó and his colleagues further constructed phylomes for BAT93, G19833, or both using sets of protein-coding genes. Those phylomes, along with an analysis of one-to-one orthologs, helped the researchers develop a species phylogeny from which they uncovered four evolutionary periods: basal to Phaseolus, basal to legumes, basal to rosids, and basal to the split of rosids and asterids.
By overlaying duplication densities, the researchers found a pattern consistent with a wave of whole-genome duplication events when rosids and asterids split, as well as basal to the legumes. However, they noted that there was no evidence of any recent genome duplication event in either P. vulgaris lineage.
Both lineages, the researchers wrote, harbor gene clusters whose expansion in soybean is linked to nematode resistance. Such genomic adaptations could have helped P. vulgaris spread through the Americas and tend toward domestication.
In the next phase of the project, the researchers plan to sequence a dozen other bean varieties and their relatives to uncover genes related to domestication.
"This is an example of how bioinformatics and genome sequencing will thus contribute to produce higher quality and more productive varieties of a crop that has become essential for human consumption," added Alfredo Herrera-Estrella from the National Laboratory of Genomics for Biodiversity in Mexico.