NAME: Isidore Rigoutsos
POSITION: Manager, bioinformatics & pattern discovery group, IBM Thomas J Watson Research Center’s Deep Computing Institute
— University research collaborator, molecular biology, Princeton University, 2005-2006
— Visiting lecturer, chemical engineering, Massachusetts Institute of Technology, 2000-present
— PhD, computer science, New York University, 1992
— MSc, computer science, University of Rochester, 1987
— BSc, physics, National University of Athens, Greece, 1984
This week, researchers from IBM’s Thomas J Watson Research Center and the Genome Institute of Singapore reported in Nature the existence of naturally occurring microRNA targets, some of which are species-specific, in amino acid coding sequences.
Focusing on three genes closely associated with stem cell differentiation, the authors wrotethat their findings “support an augmented model whereby animal miRNAs exercise their control on mRNAs through targets that can reside beyond the 3’ untranslated region,” they wrote.
This week, RNAi News spoke with IBM’s Isidore Rigoutsos, the senior author on the paper, about the research.
Let’s start with some background on where you were coming from when you began the work detailed in Nature.
At a meeting at [the University of Pennsylvania] around 2002, [colleagues] said, “We have this interesting phenomenon we’re trying to study,” and … described RNA interference. [One of the questions that they also raised was that of determining which candidate sequences are targeted by a given microRNA], By that point I had accumulated many years of experience in data mining … so I thought that perhaps I could approach the problem from the standpoint of indirect data mining.
One thing led to another, and I developed a prototype solution … that starts not with known microRNA/mRNA target pairs, which are basically what every other method uses for training, but with known microRNA sequences and tries to find salient sequence features that are conserved across all pairs of those sequences, all triplets of those sequences, all quadruplets of those sequences, and so on.
The idea was that if there are sequence features that are conserved across different instances of microRNA sequences, perhaps they are there for a reason. [Using a data-mining method I developed in 1998], we were getting a lot of patterns that we could see were conserved across the known microRNAs that were available at the time.
The conjecture that I posed at that point was that if these patterns are conserved in microRNAs, then the reverse complement of those patterns should be conserved at those sites that are targeted by those microRNAs. That was the linking observation, and we said, “All right, let’s put it in practice.”
That led to the prototype that I mentioned that was able to discover pretty much all of the microRNA target pairs that were known at the time. This was in 2003, and in 2004 I was in Singapore for a human genome meeting and I was introduced to [investigators who are co-authors on the Nature paper]. I discussed my ideas with them and we decided to try them in the lab.
I should say that … back in those days, everybody was of the opinion that one microRNA has tens of targets based on the experimental evidence that was available back then. However, our computation was saying that this is probably not accurate and was suggesting targets in the low thousands per microRNA. This is something that we decided to explore in the context of a few microRNAs that are up-regulated in embryonic stem cell differentiation.
Eventually, this earlier work led to a paper in Cell in 2006 where we described the rna22 algorithm and we described evidence for one microRNA targeting lots and lots of 3’ UTRs. At the time, all of the evidence was generated using luciferase assays; this was necessary in a way since we ended up testing more than 200 microRNA/mRNA target pairs.
The outcome of that first effort was that about 75 percent of the computational predictions were actually down-regulated in these luciferase assays. We [also] described a few things in [the paper] as conjectures that arose from the computational analyses. We also showed how the key method could easily be modified to find novel microRNAs concluding that there must be tens of thousands of microRNAs in the human genome.
As far as target detection was concerned, the method was suggesting that there must be microRNA targets in the 3’ UTRs, which was what everybody had seen at this point, but also in coding regions, as well as the 5’ UTRs.
And that was controversial at the time.
Actually, many of the things we were saying in the Cell paper were controversial at that point in time. Remember, this is a period when almost everybody thought that [for each] microRNA there are only a few tens of targets; we’re coming in and saying, “We suspect that one microRNA could go after a couple of thousand targets.”
Also, at that point in time, everybody was saying [there were] a few hundred microRNAs in humans, and we were saying, “Well, it looks that there may be as many as 50,000 or more in humans.” Interestingly enough, for the mouse, [whose genome] is more or less the same size as a human’s, we were finding about 44,000 predicted microRNAs. But coming back to the question of microRNA targeting, the possibility that animal microRNAs target outside the 3’ UTR was not what people were discussing at the time.
After the Cell paper was published, my colleagues and I said, “Let’s go after these other conjectures.” The one that seemed most intriguing and which we decided to pursue first was the targeting outside of the 3’ UTRs. So we went ahead with that and the outcome is the [Nature] paper where we’re demonstrating coding region-targeting by multiple animal microRNAs.
Can you touch on that work?
In a nutshell, this is computation-driven experimentation. We started with the methodology that we published in the Cell paper. Because of the expertise of my colleagues in stem cell differentiation, we said, “Let’s use stem cells as our platform,” and we focused on a few microRNAs that are up-regulated during stem cell differentiation to predict targets.
As I said before, the rna22 methodology indicates that one microRNA can go after many targets, and clearly we could not validate these very large numbers [of targets]. [So we decided to] examine whether, among the predicted targets, we find the triumvirate of transcription factors that everybody focuses on in the context of studying stem cell differentiation, namely Nanog, Oct4, and Sox2.
It turned out that all three of them were targeted by the microRNAs of interest, according to our computational models, and in fact were targeted multiply. Once we had the putative target sites, we began with luciferase assays to get a first sense of what might be going on. Since [the targets] were in the coding region, we also had the luxury of replacing the codons in a way that would maintain the protein output … but would disrupt the nucleotide sequence in ways that we were predicting should abolish the microRNA targeting. We created these new constructs that have the modified codons and showed that they are not targeted by the corresponding predicted microRNA.
Finally, we demonstrated the coding region-targeting by looking at physiological evidence; basically, we looked at markers of stem cell differentiation and demonstrated that by introducing these mutant coding regions, we can delay stem cell differentiation.
Is the overall message that people start thinking about animal microRNA targets beyond the 3’ UTR?
That certainly is one message, but there are several more. Several of the targets that we validated with our coding mutants and physiological evidence exist in the transcription factors in mice but not in the human or rhesus [monkeys] orthologs of these transcription factors. This suggests that, as far as the control of these transcription factors is concerned by the three microRNAs we were studying, the regulation picture in mice may be slightly different than the picture in humans.
Typical approaches to finding microRNA targets have used cross-genome conservation; basically, you would look for islands of conserved nucleotide segments in an otherwise non-conserved surrounding sequence. But, based on our findings, such an approach will likely constrain the set of targets that can be discovered. If you were to [apply this same approach to] these transcription factors in the coding region, you have several problems, one being that at the nucleotide level … the degree of conservation is much higher than it would be in the 3’ UTR. So, effectively, much of the sequence would be conserved and cross-genome conservation would buy you much less than it would in the 3’ UTR. So, you wouldn’t necessarily be able to discover many of these targets because they are not conserved in humans or rhesus monkeys. In the absence of sequence conservation, conventional wisdom would suggest that this is likely not a target. But as it turned out, it is.
Another message is that, if you recall the original papers … in which [it was shown that] let-7 targets lin-41 and lin-14, those early microRNA/mRNA pairs contained bulges or G:U pairs in the seed region. What we’re seeing in our results is that these situations may not be uncommon. Four out of the 5 targets that we’re showing do not contain the seed either because there is a G:U pair or there is a bulge. If one applies seed conservation to filter out candidates one may miss bona fide targets.
[Another] very exciting result, although it says that the problem is more challenging than we thought, is something that we’re pointing out in the [Nature] paper but published a year ago in Stem Cells. [In that paper, we showed] that microRNA-134 targets the 3’ UTR of Nanog whereas in [our newest paper], we’re showing that it targets the CDS of Sox2.
What we’re pointing out is that the 3’ UTR target of Nanog for microRNA-134 is actually right in the middle of a B2 SINE element that is embedded in that 3’UTR. To the best of my knowledge, this is the first concrete example of an experimentally validated interaction between a microRNA and a repeat element. Because the target is in the middle of the repeat element, clearly the context in which that target appears would be conserved across many of the other instances of that SINE element. Since the local structure in three dimensions is likely also conserved, any gene transcript that contains that same repeat element in its messenger RNA must be targeted by microRNA-134, which of course raises the number of eventual targets while at the same time brings repeat elements into the picture.
Repeat elements are generally not conserved across species, which again brings us back to the point about [using] cross-genome conservation to find targets. Many of these results would not have been possible if we were enforcing the cross-genome conservation constraint ourselves.
I guess keeping an open mind is going to be important as the microRNA field progresses.
I think it’s a safe statement to say that all of the practitioners, and this certainly includes us too, are realizing increasingly that [RNAi] is a deceivingly simple phenomenon. [The basic description of it] is reasonably straightforward, but there is so much behind it that we keep on uncovering that it will certainly be very helpful to be vigilant, so to speak.