NEW YORK (GenomeWeb) – An international team led by investigators in China, Taiwan, and Belgium has sequenced the genome of the orchid Phalaenopsis equestris — a favored parental plant in orchid breeding and the first sequenced representative of plants that perform photosynthesis using crassulacean acid metabolism.
As they reported in Nature Genetics today, the researchers sequenced DNA from an inbred P. equestris line and used it to produce an almost 1.1-billion-base genome assembly.
The resulting draft sequence spans roughly 93 percent of the complete P. equestris genome and contains more than 29,000 predicted protein-coding genes: among them, candidate genes that appear to explain everything from the plant's place in the plant family tree to the genes behind its distinguishing features.
"The complete genome sequence of P. equestris will provide an important resource to start exploring orchid diversity and evolution at the genome level, which will be important for ecological and conservation purposes," the study's authors wrote.
"The genome sequence will also be a key resource for the development of new concepts and techniques in genetic engineering, such as molecular marker-assisted breeding and the production of transgenic plants, which are necessary to increase the efficiency of orchid breeding and aid orchid horticulture research," they added.
In addition to their economic significance and desirability as ornamental plants, the researchers explained, members of the Orchidaceae plant family have drawn attention from investigators due to their intricate floral structure, varied reproductive capabilities, diverse ecological specializations, and co-evolution with insect pollinators.
Orchids are also amongst the plants that rely on crassulacean acid metabolism, or CAM, a form of photosynthesis in which plants take in carbon dioxide through their leaves at night, rather than the daytime, to decrease water loss by evaporation. That carbon dioxide is then stored and used for photosynthesis during the day.
In an effort to start untangling the genetic underpinnings for these and other orchid features, the researchers focused on the diploid P. equestris species. After creating libraries from DNA in leaves and flowers from multiple representatives in the same inbred P. equestris line, they sequenced seven of the libraries using the Illumina HiSeq 2000 instrument.
In the process, the team generated more than 119 billion bases of sequence data, which it used to assemble a 1.086-billion-base draft assembly for P. equestris.
To aid in their annotation of the genome and gene expression profiling of the plant, the researchers also generated RNA sequence data on P. equestris root, stem, leaf, and flower samples. They noted that an existing database known as OrchidBase also contains transcriptome sequence data for the plant.
The team's analysis identified 29,431 predicted protein-coding sequences in the P. equestris genome, though repetitive elements such as interspersed repeats, transposable elements, and tandem repeats made up more than 60 percent of the plant's genome.
Nearly 5,700 P. equestris gene families overlapped with those found in rice, grapevine, and Arabidopsis thaliana. On the other hand, some 4,171 orchid genes appeared distinct from genes documented in a wide range of sequenced plants that included Arabidopsis thaliana, black cottonwood, grapevine, rice, purple false brome grass, sorghum, maize, Physcomitrella patens moss, and sequenced green algae species.
Within tricky-to-assemble regions of the genome that contained highly heterozygous sequences, meanwhile, the team saw variant clusters containing some 1.7 million SNPs. The heterozygous regions were also home to a slew of genes related to plant defense and programmed cell death, which are suspected of contributing to plant processes that deter self-pollination.
The researchers' look at the orchid's evolutionary history and relationships to other plants — based on a phylogeny built from information at up to 342 single-copy genes — indicated that orchids belong to a lineage that split from other monocot plants roughly 135 million years ago.
That split was followed by a whole-genome duplication event involving the ancestor of most orchid clades that occurred an estimated 75.57 million years ago, they reported.
For subsequent stages of their analysis, the researchers tracked down and characterized genes involved in orchid flower morphology and development as well as enzyme-coding genes contributing to CAM.
For instance, the data available indicated that CAM-based photosynthesis in orchids evolved through a series of gene duplication and gene loss events in that lineage. Nevertheless, the team noted duplicated genes involved in that process do not seem to stem from the whole-genome duplication documented in the orchid ancestor.