NEW YORK (GenomeWeb) – Using single-molecule sequencing, researchers led by the Donald Danforth Plant Science Center's Todd Mockler have generated a near-complete genome of Oropetium thomaeum, a desiccation-tolerant grass.
While a number of plants have been sequenced, including grasses like maize, wheat, and barley, many of the assemblies are fragmented due to their reliance on short reads produced by next-generation sequencers. To generate this O. thomaeum genome, Mockler and his colleagues instead turned to long reads from Pacific Biosciences' platform, as they reported today in Nature. With these reads, they assembled nearly all of the O. thomaeum genome and were able to delve into its intergenic and duplicated sequences, some of which may be vital to the plant's ability to withstand drought.
"The draft genome is near complete because we were able to sequence through complex repeat regions that are unassembled in most draft genomes," Mockler and his colleagues wrote in their paper.
Using 32 SMRT cells on the PacBio RS II, the researchers generated about 72X coverage of the O. thomaeum genome. This sequence, they reported, had an N50 length of more than 16 kilobases and had 10X coverage of reads that were greater than 20 kilobases in length.
After error correcting, the researchers used the Celera assembler to assemble the longest reads of more than 16 kilobase, which then underwent genome polishing using Quiver.
This assembly, they reported, contains 650 contigs that span some 99 percent of the estimated 245-megabase O. thomaeum genome. They also noted that the 35 largest contigs span half the genome, while the largest 107 contigs span about 90 percent of it.
Mockler and his colleagues also assembled the 135,324-basepair chloroplast genome into a single contig, and the mitochondrial genome into 20 partially overlapping circular chromosomes.
This completeness of the O. thomaeum genome enabled the researchers to examine its repetitive regions.
This assembly includes all 18 telomeric arrays, with repeat length ranging from 40 to 900, which suggested that some are likely full length. Three of the nine centrometic satellites, meanwhile, were assembled into large inverted repeats that span some 400 kilobases.
Repetitive elements, the researchers noted, make up a sizeable portion — 43 percent — of the O. thomaeum genome, as compared to 21 percent of Brachypodium. The wheat genome, by contrast, is 90 percent repeats.
Many of the repeated elements in the O. thomaeum genome are long terminal repeat retrotransposons, and the researchers identified 3,247 LTRs in 358 families, similar to what's found in rice.
Though O. thomaeum has the smallest genome among the grasses, it still harbors a typical number of predicted protein-coding genes, some 28,446. It does not appear, the researchers noted, that the O. thomaeum genome underwent any whole-genome duplication events after the pan-cereal duplication event known as rho.
As the size of coding sequences is rather similar among grass genomes, the researchers suggested that differences in genome size among grasses is due to variations in their intergenic content. For instance, the intergenic content of the O. thomaeum genome, they noted, is reduced as compared to other grasses.
By comparing orthologous sequences of sorghum with its 730-megabase genome to those of O. thomaeum, the researchers found that the sequences were some 38 percent larger in sorghum, likely due to the 1-kilobase-long intragenic sequences that are evenly spaced through its genome — possibly the remnants of partially lost transposons. O. thomaeum, on the other hand, has a solo-to-intact LTR ratio of greater than one, indicating active purging of transposons and the loss of these regions.
This supports the Genome Balance Hypothesis, which says that selection on gene networks and pericentromeric growth is balanced by transposon proliferation and retention. These evenly spaced sequences in sorghum then balance out the six-to-one expansion of pericentromeric sequence in sorghum as compared to O. thomaeum.
Tandem duplicated genes, Mockler and his colleagues noted, have often been linked to stress response and adaptive evolution, and O. thomaeum contains some 6,668 tandem duplicated genes in 2,326 clusters, slightly more than other grasses. These genes are enriched for gene ontology terms associated with abiotic stress response, gene regulation, and cellular metabolism.
O. thomaeum also has more than 4,200 homologous gene pairs that it retained from the rho genome duplication event, and those genes are also enriched for involvement in gene regulation as well as stress responses, including to abiotic stimulus, salt stress, and oxygen-containing compounds.
"Understanding the genomic mechanisms of extreme desiccation tolerance in resurrection plants such as Oropetium may provide targets for engineering drought and stress tolerance in crop plants," the researchers concluded.