NEW YORK (GenomeWeb News) – A team including researchers from Purdue University, the US Department of Energy's Joint Genome Institute, and the US Department of Agriculture's Agricultural Research Service reported online in Nature today that they have sequenced and started characterizing a draft version of the soybean genome.
Using whole-genome shotgun sequencing, the researchers sequenced roughly 85 percent of the soybean (Glycine max) genome. The genome is already providing insights into soybean biology and is expected to have applications for improving soybeans and related plants.
"This new information about soybean's genetic makeup could lead to plants that produce more beans that contain more protein and oil, better adapt to adverse environmental conditions, or are more resistant to diseases," USDA Deputy Under Secretary for Research, Education and Economics Molly Jahn, who was not directly involved in the project, said in a statement.
The soybean is the first legume to be sequenced and its genome is the largest plant genome sequenced by whole-genome shotgun sequencing so far.
Soybean is grown for its seeds and oil and is used to produce everything from tofu and soy flour to oil-based ink and biodiesel. Like other legumes, soybean works in concert with microorganisms living in its root nodules to fix atmospheric nitrogen.
In an effort to better understand these and other soybean processes, the team used the Sanger method to do whole-genome shotgun sequencing of the 1.1-billion-base genome of a soybean variety called Williams 82.
Although they knew going in that the soybean genome contained lots of duplications, the researchers were able to use a shotgun sequencing approach by doing careful assembly and analysis along the way, senior author Scott Jackson, a plant genomics and cytogenomics researcher at Purdue University, told GenomeWeb Daily News.
In the process, the team generated 950 million bases of assembled, anchored sequence — believed to represent some 85 percent of the plant's total genome sequence.
Their analyses uncovered 46,430 high-confidence protein-coding genes housed on the plant's 20 chromosomes as well as another 20,000 or so lower-confidence loci.
Some 78 percent of the genes appear to be located near low repeat-containing chromosome ends — areas known for high rates of recombination; while nearly 22 percent of the genes are in the repeat- and transposon-rich heterochromatic regions near centromeres. Jackson said more research is needed to understand whether there are functional differences between genes found in each of these regions.
The team's analysis also suggests the existing soybean genome is a product of a genome duplication roughly 59 million years ago and a more recent duplication about 13 million years ago. Consequently, almost three-quarters of genes in the soybean genome are present in multiple copies.
While around 73 percent of the high-confidence protein-coding genes in soybean have orthologues in other angiosperms, the researchers also identified 448 high-confidence genes in 283 families that seem to be legume specific.
And by combining information from the genome with previous genetic data and markers, the team has already started identifying genes involved in agriculturally, economically, and ecologically important soybean traits.
For example, the team has identified mutations linked to soybean digestibility and to the levels of a phosphorus storage compound called phytate that the plants produce. They've also gotten a better idea of the pathways involved in oil biosynthesis and cloned a gene involved in resistance to a soybean disease called Asian soybean rust.
"With this high-quality sequence, we now have access to candidate genes that we've never had before, which will enable us to look at their patterns of expression, develop molecular markers to track them in breeding programs, and work with them to determine their function or modify them to improve their function," co-author Randy Shoemaker, a geneticist with USDA-ARS, said in a statement.
Those involved in the sequencing effort are also touting the soybean genome as a promising reference genome for related plants. For example, Jackson noted, researchers are interested in leveraging the soybean genome to better understand closely related — but poorly studied — subsistence crop plants such as the pigeon pea and common bean, which are grown in India and parts of Africa.
For their part, Jackson and his team are embarking on an effort to re-sequence plants that are thought to be ancestral to domestic soybeans.
Finally, the genome is expected to inform soybean-focused biofuel studies. In a statement issued today, US Department of Energy Associate Director of Science for Biological and Environmental Research Anna Palmisano said the soybean sequencing effort "opens the door to crop improvements that are sorely needed for energy production, sustainable human and animal food production, and a healthy environmental balance in agriculture worldwide."