\NEW YORK (GenomeWeb) – A team of Chinese researchers has analyzed the genomes of nearly 700 plant species from in and around a botanical garden in southwestern China.
Researchers led by BGI-Shenzen's Xin Liu collected 761 plant samples and generated 54 terabytes of sequencing data, generating an average sequencing depth of 60X per species. This enabled them to not only construct a reference phylogeny, but also develop a freely available reference set of plant sequencing data and corresponding samples, as they reported in the journal GigaScience.
This, they added, could help fuel initiatives such as the Earth BioGenome Project's 10,000 Plant Genomes plan, which aims to sequence the major plant lineages.
"[This study] was a baseline project to fine tune and standardize the sampling, methodologies, and the data accumulation and analyses techniques for large-scale genome projects," Liu added in a statement.
He and his colleagues collected samples from nearly all the plant species growing at the Ruili Botanical Garden in Yunnan Province, China, near the Myanmar border. In addition to collecting young leaves from which to extract DNA for analysis, the researchers also took images and collected voucher specimens to be stored at the China National GeneBank herbarium.
Using the BGISEQ-500 sequencer, they generated 70 Gb of raw sequencing data per samples, or, after filtering, about 60 Gb of data per sample.
Based on morphology, the researchers identified the species of 257 samples, and the other 504 samples were resolved to the family level using specimen and chloroplast sequences. In all, these samples represented 137 families, with the most common ones being Fabaceae, Poaceae, and Asteraceae.
For each species, the researchers assembled their chloroplast genomes, which ranged in size from 113,621 basepairs to 183,602 basepairs. Seventy-two protein-coding genes were found in nearly all these plant families, with the exception of the Gnetaceae, Malvaceae, Elaeocarpaceae, and Tectariaceae.
They used these sequences to build a phylogenetic tree, which found the major lineages to be the Fabales, Rosales, Poales, and Malpighiales. It also revealed that, within the Fabids, the Celastrales were a sister group to the Malpighiales and suggested that the Gentianales are a sister group to the Lamiales, which is itself a sister group to the Solanales and Boraginales.
For many of these plant species, the researchers constructed preliminary genome assemblies. They chose 17 species from 17 families with low heterozygosity and repeat content rates for assembly. On average, the assemblies they generated were about 89 percent complete, with an average contig N50 of 4.62 kb and an average scaffold N50 of 32.3 kb.
Since their preliminary assemblies are good quality, the researchers said they could be used as guides for follow-up efforts to generate complete reference genomes for those species.
In addition, they noted that all the data they generated — the images, the raw sequencing data as well as the assembled chloroplast genomes and the preliminary nuclear genome assemblies — are accessible and traceable, and that the voucher specimens are stored at the CNGB herbarium. This, they said, will aid in the data's re-use, such as to improve future assembles of these plants' genomes or to develop new species identification methods.
This effort, the researchers said, also tested the feasibility of large-scale whole-genome sequencing of plants. They added that they have optimized, and published, their DNA extraction protocol, will be launching a DNA extraction kit for high molecular weight genomic DNA, and are developing guidelines for submitting samples to the 10KP.