Current estimates of the number of microRNA genes and targets in the human genome may be off by an order of magnitude or more, according to researchers at IBM and the Genome Institute of Singapore.
In a paper in this week’s Cell, the researchers suggest that the number of miRNAs could number in the tens of thousands and that some of these miRNAs may have as many as a few thousand targets.
These findings, the result of a computational miRNA target prediction method called rna22 developed at IBM Research, represent an enormous jump beyond previously published estimates of miRNA activity and are certain to stir debate in the quickly evolving field of RNA interference.
Most researchers in the field believe there are fewer than 1,000 miRNAs in the human genome, with only tens to hundreds of targets per miRNA. Chris Burge, a computational biologist at the Massachusetts Institute of Technology who co-authored a paper last year in Cell that used comparative genomics to show that up to a third of the human genome might be regulated by miRNAs, told BioInform that while he had not yet read the IBM paper, he is “skeptical” about the sheer numbers of predicted targets per miRNA.
“It’s hard to imagine and it would take very strong evidence to really convince people that this is true,” Burge said.
Isadore Rigoutsos, the developer of rna22 and an author on the paper, said that he expected this response and described the paper as a “conversation piece” for the field.
Rigoutsos noted that there was a flurry of research in 2003 and 2004 trying to assess the scope of miRNA activity, but that work has slowed down. “The generic questions of how many microRNAs are encoded by a given genome and how many targets does a given microRNA have were not being revisited because people were trying to answer the next set of questions,” he said.
Rigoutsos said he spent four years developing and validating rna22 and took pains in writing the paper “to make it as succinct as I could and to avoid making any statements that were not supported by experiments.”
The goal, he said, is to spur enough interest in the community to experimentally validate the predicted miRNAs and targets in the paper.
In the paper, the IBM and GIS researchers use rna22 to process four genomes: C. elegans, D. melanogaster, M. musculus, and H. sapiens. The average number of 3’ UTR sites targeted by known microRNAs ranged from 73 in C. elegans to 1,008 in human. The number of microRNA precursors, meanwhile, ranged from 359 for C. elegans to 25,000 in human at a folding-energy cutoff of -25 Kcal/mol. At a less stringent threshold of -18 Kcal/mol, that estimate rises to 55,000 microRNA precursors in the human genome.
In another surprising discovery, the team’s results suggest that microRNA binding sites may not be limited to 3’ UTRs as previously thought, but may also occur in 5’ UTRs as well as in coding sequences.
“If what we’re saying here is validated by more groups in a different context, a lot of things have the potential to change; our understanding of cell process regulation and the magnitude of the underlying network is likely to change,” Rigoutsos said.
“Some people may embrace this and see it as a good thing, others may not embrace it. I’ll be very happy if we start the conversation — that’s the only thing I’m hoping for,” he added. “At the end of the day, these things take time to validate, especially if the kinds of numbers claimed in the paper end up being true.”
Taking a Backwards Approach
The method that Rigoutsos developed to predict miRNA binding sites, rna22, diverges from previous computational methods in several ways. Most of these methods start with a known miRNA and match a region of the 5’ UTR with candidate 3’ UTRs across a given genome. Many of these methods also use conservation across species as a filter.
The rna22 method, however, does not rely on cross-species conservation at all, which enables it to detect potential binding sites that may not be present in closely related species. In addition, instead of a single miRNA, it begins with an entire set of known miRNAs — in the example in the Cell paper, the researchers used 644 miRNA sequences from the January 2004 release of RFAM — and applies Rigoutsos’ Teiresias pattern-matching algorithm to extract motifs with a minimum of 4 nucleotides that appear at least twice.
Armed with this set of patterns, rna22 then scans the genome for reverse complements of the motifs to find putative miRNA binding sites.
“Basically, we’re doing things backward,” Rigoutsos said. “What this method is doing is saying ‘I’ll ignore for a moment the microRNA that you’re interested in and instead I’ll take all known microRNAs, find the patterns that capture salient sequence features that these guys have, take the reverse complement of the pattern, and process your genome of interest.’”
The primary benefit of this approach, he said, is that by using patterns instead of the sequences of particular miRNAs, rna22 can identify binding sites for miRNAs that have yet to be discovered.
Rigoutsos and his team used an older version of RFAM as their training set in order to prove this concept. The method was able to identify the cel-lsy-6 binding site in the 3′ UTR of the cog-1 gene from C. elegans even though cel-lsy-6 is not in the January 2004 version of RFAM “and shares no sequence similarities with any of the microRNAs contained in that release,” according to the authors.
“That was an example of finding a validated binding site without seeing a microRNA sequence that looks like the microRNA that binds to it,” Rigoutsos said. “It’s one data point, admittedly, but certainly an encouraging one.”
Oliver Hobert, a researcher at Columbia University who discovered cel-lsy-6, described the IBM study as “a very intriguing piece of work,” and noted that “it’s truly astonishing to see so many new microRNAs coming up.”
He pointed out, however, that while the experimental validation that IBM and the Genome Institute of Singapore performed with luciferase assays was “the best that one can currently do with most microRNAs,” it was “still preliminary.”
“The generic questions of how many microRNAs are encoded by a given genome and how many targets does a given microRNA have were not being revisited because people were trying to answer the next set of questions.”
“The problem with validation is that all the target predictions that are out there, including [IBM’s] new prediction, they all rely on validation strategies that are largely in vitro, and it is unclear whether they really reflect a true endogenous microRNA-target interaction,” Hobert said.
In a paper recently published in Nature Structural and Molecular Biology, Hobert and co-author Dominic Didiano use C. elegans to test several predicted microRNA binding sites. “What we have been able to do in our paper is to take these old predictions and use a much superior in vivo system to validate those previous predictions and found them to be unreliable,” Hobert said.
He stressed, however, that he has not yet systematically examined any of the miRNA binding sites predicted by rna22. IBM’s predictions “may suffer the same fate in the future, but they also may not. They may be validatable in a good in vivo system, but we just don’t know yet.”
Nevertheless, he added, “It’s certainly extremely refreshing to see that there are other ways to predict targets, and to have now more things to try to validate.”
Bing Lim, senior group leader of the stem cell and developmental biology
Group at the Genome Institute of Singapore and a co-author on the Cell paper, told BioInform via e-mail that “most non-computational biologists” would view the findings of the paper “very skeptically, including myself.”
He noted, however, that “It has always been known that the post-transcriptional gene regulation would be complex” and the fact that “most genes/mRNAs are predicted to be under this control is therefore not surprising, and it also implies that gene expression in different tissues can be under very different control, allowing specific tissues to respond in a particular manner to stimuli.”
Stressing that “we are still quite early in the small non-coding RNA field,” Lim said, “We are still finding our way through the complexities, and as we do so, will have to develop new technologies to study the complexity and subtlety and critical nature of post-transcriptional processes.”