NEW YORK (GenomeWeb News) – A team of American and Australian researchers has developed a method for assessing an organism's transcriptome — even in the absence of available genome sequence.
The researchers used Roche 454 sequencing to sequence the transcriptome of the larval developmental stage of staghorn coral Acropora millepora. Their approach relied on a novel DNA preparation technique coupled with sequencing, followed by assembly, annotation, and analyses using publicly available tools and databases. The research, which appeared online today in BMC Genomics, identified about 11,000 protein-coding genes and more than 30,000 genetic variants.
"I think this will facilitate an explosion in the science of coral adaptation and evolution," senior author Mikhail Matz, an integrative biology researcher at the University of Texas at Austin, said in a statement. "We developed a big boot to kick down a door leading to coral genomics."
By understanding coral genomes and transcriptomes, researchers hope to find clues about how coral respond to stress and environmental changes. But transcriptomics is more difficult — and less common — when a reference genome sequence is not available for the organism of interest.
To overcome such drawbacks, Matz and his team developed a method for assembling and annotating transcriptome sequences to find expressed genes in the absence of complete genome sequence.
They focused on A. millepora, a hard coral found in the western Pacific Ocean and the Southern Ocean. Several thousand genome reads are available for A. millepora and the related corals A. palmate and Porites lobata from pilot shotgun sequencing projects. But genome wide sequence is not available.
To prevent potential contamination with DNA from algae and other coral symbionts, the researchers looked at A. millepora larval stage, a developmental stage without known symbionts. The larvae were subjected to temperature stress and other environmental conditions to come up with a broad range of expressed genes.
The researchers then produced complementary DNA from coral larvae using library preparation procedures aimed at maximizing data quality from subsequent sequencing steps. They then used the Roche 454 GS FLX platform to generate 628,649 sequence reads from the sample.
After tossing out adapter sequences, doing size selection, and incorporating publicly available EST data, the researchers assembled 44,444 contigs. Along with the contigs, they also analyzed singleton reads that didn't assemble into contigs, finding evidence that at least some of these "represent unique genes expressed at levels low enough to hinder adequate sampling."
By comparing assembled contig and singleton sequences to public protein sequence databases, the team created scaffolds representing sequence fragments for each transcript. A handful of these were subsequently validated by PCR-based analyses. To minimize redundancy in the sequence information, the team assembled sequences in parallel based on nucleotide and protein sequence similarity, generating merged clusters based on where these assemblies overlapped.
Together, the researchers found roughly 11,000 to 11,500 known genes that were expressed by the coral larvae. They noted that that is likely an under-estimate, since many of the sequences had no matches in public databases and could not be named.
Even so, the team found overlap between sequences in the A. millepora transcriptome and draft genome sequences from the coral's closest sequenced relative, the anemone Nematostella vectensis, suggesting the work "represents a reasonably complete description of the coral larval transcriptome." For example, the team identified more than 8,500 orthologs between the coral and anemone, along with another 748 coral sequences with orthologs in other organisms but not in the anemone.
The transcriptome data also contained sequences corresponding to nearly 4,500 domains from the NCBI's conserved domains database, particularly associated with transcription factors, growth factors, and signaling pathways. In addition, 17,902 assembled coral larvae transcript sequences could be assigned gene ontology terms. Of these, 320 sequences apparently corresponded to stress response.
Based on their analyses with the QualitySNP program, the team found 33,433 high quality SNPs and more than 6,800 apparent insertions or deletions in the coral transcriptome.
The authors noted that such information will likely prove useful for developing genetic markers and for designing microarray probes that will let researchers look more specifically at gene expression changes linked to heat tolerance, immunity, and other biological processes in coral. So far, the researchers have picked out 11,000 sequences for such probe design.
"Our findings provide a nearly complete description of the genes expressed in coral larvae, a resource that is expected to be immediately useful for measurements of gene expression in reef-building corals, in addition to a large number of genetic markers for studies of genetic connectivity and structure," the authors concluded. "Application of these resources will greatly enhance our understanding of the potential for corals to adapt to increasing environmental stress during climate change."