Skip to main content
Premium Trial:

Request an Annual Quote

Cotton Genomes Provide Insight on Genome Evolution, Clues for Crop Improvement

NEW YORK (GenomeWeb) – Two groups have de novo sequenced the whole genome of the cotton species Gossypium hirsutum using a combination of next-generation sequencing, BAC clones, and genetic maps. The sequenced genomes yield knowledge about how the crop has evolved and the function of genes, particularly those related to fiber biology.

The two studies, both published today in Nature Biotechnology, demonstrate the challenges of sequencing polyploid genomes, the complexity of the cotton genome, and pave the way toward better crops.

Allotetraploid upland cotton represents more than 90 percent of cultivated cotton. While it is a main source of textile fibers and is grown for oilseed, little is known about its genome.

The G. hirsutum genome was created from two diploid genomes of two other species — one that originated from Africa and one that originated from Mexico — between 1 million and 2 million years ago that no longer exist, so even though two other progenitor species have been sequenced, to fully understand the G. hirsutum genome and gain insights into cotton fiber biology, the researchers decided to de novo sequence it.

Both studies used similar sequencing and assembly approaches yielding comparable results.

In one study, researchers from the Nanjing Agricultural University in China, the University of Texas, and the Novogene Bioinformatics Institute in Beijing sequenced the Texas Marker-1 genome (TM-1) — a widely used genetic standard and allotetraploid genome — by integrating whole-genome shotgun reads, BAC-end sequences, and genotype-by-sequencing genetic maps.

In the second study, researchers from the Chinese Academy of Agricultural Sciences, BGI, Peking University, and the US Department of Agriculture also sequenced the G.hirsutum TM-1 genome using 181x paired-end sequences along with 5x BAC-to-BAC sequences and a high-resolution genetic map. This group also performed transcriptome sequencing to help with gene annotation.

Both groups also compared their assemblies to ancestral diploid genomes.

"Polyploidy generates genetic and gene expression novelty but with this comes redundancy that impedes sequence annotation and assembly," the authors from Nanjing Agricultural University wrote. They added that their approach "could be applied to the sequencing of complex genomes of other polyploidy crops."

Those researchers generated 612 gb of Illumina reads, or about 245x coverage of the genome, and assembled them using the SOAPdenovo algorithm. Next, they integrated those reads with 116.6 Mb of BAC-end sequences and assembled the genome. To correct misassemblies and further improve the genome, they developed a genetic map using genotyping by sequencing of 59 progeny strains from TM-1 and another Gossypium species. The final assembly consisted of 265,279 contigs with an N50 of 34 kb and 40,407 scaffolds with an N50 of 1.6 Mb. The scaffolds spanned 96 percent of the genome, and about 6,000 of the scaffolds were aligned and organized into 26 pseudochromosomes, 4,635 of which could be traced to African ancestry, called the A subgenome, and 1,511 of which had lineage from Mexico, or the D subgenome.

The group predicted over 70,000 protein-coding genes, including 32,032 in the A subgenome and 34,402 in the D subgenome, that were functionally annotated.

In analyzing the genome, the researchers found that structural rearrangements, gene loss, disrupted genes, and sequence divergence were more prevalent in the A subgenome than the D subgenome, suggesting asymmetric evolution

Looking at genes and transcription factors important for cotton fiber development, the researchers found that of 10 transcription factors related to the MYB-domain, which is important for cotton fiber development, all were highly expressed during fiber initiation.

Interestingly, the researchers found that the two subgenomes each had independent contributions to the important traits of fiber quality and stress tolerance. For instance, the A subgenome contributed 41 positively selected genes related to fiber quality, while the D subgenome contributed 68 positively selected genes related to stress tolerance.

The group was also able to identify 32 genes related to cellulose synthase, which is required for making cellulose, including a group of genes that were expressed at levels 1.5 to 40 times higher than in wild cottons, "suggesting their potential role in increased lint yield and improved qualities in domesticated cotton," the authors wrote.

In the study led by the Chinese Academy of Agricultural Sciences, the researchers performed Illumina shotgun sequencing with insert sizes ranging from 250 bp to 40 kb. They then generated around 5x coverage of the genome with BACs to improve the assembly, which resulted in a contig N50 of 80 kb and a scaffold N50 of 764 kb. They also performed transcriptome sequencing, identifying 108,790 transcripts, 98.9 percent of which were detected in the de novo assembly. Using a genetic map, they anchored and oriented 1,923 Mb to 26 pseudochromosomes.

Similar to the Nanjing Agricultural University team, the group identified nearly 77,000 genes, 35,056 from the A subgenome and 37,086 from the D subgenome. In addition, they reported that approximately 66 percent of the genome was composed of transposable elements.

In addition, they found that the current A genome contains many genes that were not present in the ancestral homeolog and were transferred from the D genome.

Looking at their transcriptome data, the Chinese Academy of Agricultural Sciences team identified a significant number of genes from the A subgenome associated with fiber development.

Specifically, they reported on a group of genes known as ACOs, which modulate cotton fiber cell growth, as well as genes involved in primary and secondary wall synthesis that "might provide targets for engineering of improved fiber yield," the authors wrote.

The data from both groups will be deposited online in GenBank.