NEW YORK (GenomeWeb) – Research teams from China have resequenced hundreds of cotton accessions, uncovering variants with ties to economically and environmentally important traits such as fiber strength or drought tolerance.
For the first of the studies, published online today in Nature Genetics, a team led by scientists from the Institute of Cotton Research (ICR) of the Chinese Academy of Agricultural Sciences and Novogene Bioinformatics Institute uncovered some 3.66 million SNPs in genome resequencing data from 419 cotton accessions from Gossypium hirsutum — a widely cultivated "upland cotton" species.
After sequencing each accession to an average depth of 6.55-fold coverage with Illumina HiSeq instruments and DNA isolated from cotton seedlings, the team identified almost 3.7 million high-quality SNPs across the upland cotton accessions.
With these variants, the team placed the accessions into three genetic clusters, getting a glimpse at the genetic diversity differences in early cotton varieties developed before 1976 and modern varieties established between 1996 and 2008.
Together with phenotypes for plants from six sites over two years, the variants also made it possible to carry out a genome-wide association study focused on more than a dozen cotton fiber-related traits, ranging from fiber yield, length, and strength to flowering or fiber initiation.
In particular, the researchers narrowed in on 7,383 fiber trait-related variants falling in or around more than 4,800 cotton genes. That set included variants at loci already implicated in cotton traits such as fiber yield, along with previously undetected associations affecting fiber quality, yield, and other cotton traits.
"The results should credibly provide targets for molecular marker selection and genetic manipulation of cotton improvement to meet the growing demand for renewable fiber," co-corresponding author Xiongming Du, an ICU cotton biology researcher, and his colleagues wrote, emphasizing that "[f]uture work will be necessary to validate more genes underlying the traits."
In another Nature Genetics paper, Du and colleagues from ICR, Anyang Institute of Technology, Peking University, and other centers in China presented genome re-sequencing data for 243 G. arboreum and G. herbaceum accessions, representing the diploid descendants of ancestral plants leading to the cultivated allotetraploid cotton plant's A sub-genome.
That team began by generating a new reference genome for G. arboreum using Pacific Biosciences RSII long reads and Hi-C mapping data, using new and available reads to put together a 1.71-billion-base assembly for the diploid cotton plant with nearly 1.6 billion bases anchored to 13 G. arboreum pseudochromosomes.
From there, the researchers turned to Illumina HiSeq 2500 instruments to generate sixfold average coverage for another 230 G. arboreum and 13 G. herbaceum lines from South China, the Yangtze River, and Yellow River regions — plants related to the A2- and A1 subgenomes, respectively.
"These regions represent most of the phenotypic and geographical diversity known for diploid cottons in China," co-corresponding authors Fuguang Li, Yuxian Zhu, and Tao Lin, and their colleagues wrote.
When they compared the resequenced accessions to the new G. arboreum genome, for example, the researchers uncovered nearly 17.9 million high-quality SNPs and another 2.5 million small insertions and deletions smaller than 190 base pairs apiece. Focusing on more than 72,400 of the SNPs, they considered the phylogenetic relationships between G. arboreum, G. herbaceum, and G. raimondii, another sequenced diploid cotton species in the same lineage as the cultivated cotton plant's D sub-genome.
Results from those and other analyses suggest that both G. arboreum and G. herbaceum are equally diverged from the G. raimondii lineage.
The team also uncovered evidence that G. arboreum originated in South China — from a wild progenitor distinct from that leading to the related G. herbaceum species — before being introduced to the Yangtze and Yellow River regions, the authors noted, leading to three genetically diverged populations reflecting prior artificial selection events in different regions.
"Several phenotypes such as yield and disease-resistance traits changed substantially during the migration of cotton from [South China] to the [Yangtze River] and further to the [Yellow River], thus suggesting positive inputs from local environments as well as human selection," the authors wrote, adding that geographic isolation has affected the genetics of these populations, and "influenced the development and distribution of disease resistance and yield traits of G. arboreum in China."
For their own association analysis, the researchers identified more than two dozen genic and 73 non-coding sites in the genome with apparent ties to up to 11 plant traits that tend to vary from one environment to the next. For example, they tracked down a chromosome 11 SNP in a synthase enzyme gene with apparent ties to oil content in the accessions considered. Another chromosome 11 variant coincided with resistance to fusarium wilt disease, a condition caused by a type of Fusarium oxysporum fungus.
The team also used a quantitative trait locus- and SNP-based approach to search for genetic differences in G. arboreum accessions with or without cotton fuzz covering the seeds, leading to a region upstream of a proposed B type cyclin gene already implicated in fiber development-related processes.