NEW YORK (GenomeWeb News) – Yale University researchers have used computational approaches to find new bacterial non-coding RNAs in DNA sequence data from environmental samples.
In a paper appearing online today in Nature, the team described their bioinformatics approach for finding the previously unidentified non-coding RNAs, focusing on six large, structured RNAs found while analyzing DNA sequences — including sequence sets from environmental sampling efforts. Their results so far suggest some of the RNAs are very structurally complex and may act enzymatically, while others are found at high levels in bacteria from certain locations.
"Our work reveals new classes of large RNAs exist, which would be akin to protein scientists finding new classes of enzymes," senior author Ronald Breaker, a molecular, cell, and developmental biology researcher affiliated with Yale University and the Howard Hughes Medical Research Institute, said in a statement. "Since we have only scratched the surface when it comes to examining microbial DNA that is covering the planet, there will certainly be many more large RNAs out there to discover and these newfound RNAs are also likely to have amazing functions as well."
Metagenomic studies sampling bacteria and archaea from various locales are not only highlighting how many of these microbial species remain uncultured, but are also uncovering new classes of proteins and RNAs, the researchers noted.
With that in mind, the team applied a computational pipeline based on phylogenetic comparative sequence analysis to look for non-coding RNAs from data housed in RefSeq as well as several environmental sequence sets.
The approach, which relied on clustering and aligning intergenic regions with similar sequences in order to predict RNA secondary structure, turned up 75 previously unidentified RNAs, including RNAs that appear to belong to a new group of riboswitches and other RNAs with possible regulatory roles.
For the current paper, the team focused on six of these RNAs: two very large, structured RNAs dubbed GOLLD and HEARO and four non-coding RNAs called IMES-1 to IMES-4 that were particularly abundant in microbes from marine environments.
In their subsequent analyses, the researchers began characterizing these RNAs — looking at features such as structure, function, distribution, and homologues in other species. They gained additional insight into how some of the newly identified non-coding RNAs may be processed.
For instance, the team found that the GOLLD RNA, found in sequences from environmental samples taken in a Panama lake, is the third largest bacterial RNA found so far — dwarfed only by 23S and 16S rRNA.
Their analyses suggest GOLLD RNA is structurally complex and similar to some known ribozymes (RNAs that act as enzymes). Along with the environmental source of GOLLD RNA, the team's also found the RNAs in eight cultivated bacteria.
Based on their findings so far, coupled with the host of uncharacterized and/or rare bacteria believed to exist in the environment, the team speculated that many other non-coding RNAs will be found in the future.
"Given that most bacterial species are extremely uncommon, more RNAs with extraordinary characteristics likely remain undiscovered in rarer bacteria," the team concluded. "Thus, improvements in sequencing technologies, cultivation methods, bioinformatics, and experimental approaches are poised to yield a far greater spectrum of biochemical functions for large [non-coding RNAs] from bacterial, archaeal, and phage genomes."