At A Glance
Name: Neil Smalheiser
Position: Assistant professor, University of Illinois at Chicago
Background: Assistant professor, University of Chicago — 1988-1996; Instructor, University of Chicago — 1986-1988; Postdoc, University of Chicago — 1983-1985; MD/PhD, neuroscience, Albert Einstein College of Medicine — 1982; BA, mathematics, University of Iowa — 1974
Neil Smalheiser is an assistant professor in the department of psychiatry at the University of Illinois at Chicago. He recently co-authored a paper in BMC Bioinformatics entitled “A Population-Based Statistical Approach Identifies Parameters Characteristic of Human microRNA-mRNA Interactions.” Recently, Smalheiser spoke with RNAi News about his work.
How did you get involved with RNA interference?
The story goes back to 2000. I’m at the Psychiatric Institute at the University of Illinois, Chicago. The institute director, who is Erminio Costa, holds a weekly seminar series where everybody takes turns giving an update of their research. Hari Manev who’s here was talking about his current research then, which was using RNA interference in adult Drosophila — they were microinjecting double-stranded RNA into Drosophila to knock out genes in the brain. He was giving the general background about RNA interference and mentioned, with C. elegans, that you can just feed bacteria that are expression double-stranded RNA to the worms, and that was enough to cause gene silencing.
Costa said that that reminded him of the old experiment of [James] McConnell, who fed trained worms to naïve flat worms in learning studies. That jogged my memory, and I said, “Yeah, the memory transfer was supposed to be with RNA, wasn’t it?” That was sort of an a-ha moment, where it seemed that there really was a connection. We wrote up a paper outlining what the rational for thinking this could be something and how you would test it, and that was published in 2001 [in Trends in Neuroscience].
We basically proposed that RNA interference could have a physiological role in learning and memory, and proposed some strategy of testing that. That’s where we got interested in it. It took about one or two more years after that to actually get funding to test it. There’s a [cutting-edge basic research award] grant mechanism [from the National Institute on Drug Abuse]. The NIDA is very interested in studies of molecular mechanisms of synaptic plasticity, and they have a high-risk/high-gain grant proposal mechanism.
We got funded through that to begin testing [our proposal] and I basically retooled the lab to start looking at that experimentally. At the time we proposed this, microRNAs had not been discovered, so this was not a microRNA hypothesis at all. The original idea was that it would be antisense/RNA transcripts that would be induced during learning, and would form hybrids with the sense strand, and that double-stranded RNA would be what would elicit RNA interference — that was our original idea. MicroRNAs are certainly another possibility, and they’re known — they’ve been studies a lot more intensively. There are about 80 of them or so known to exist in the adult brain, so microRNAs are one way this could happen, but we’re interested in any small RNA pathway that might be involved.
Can you talk about the experiments, the strategy you’re testing?
That’s changed over the years as we know more. Certainly … it’s clear that both kinds of small RNAs seem to be associated with protein synthesis. In terms of synaptic plasticity … we sort of refocused the hypothesis to look to see what [small RNAs] be doing to regulate protein synthesis near dendritic spines.
The goal is to first show that these pathways are involved in the course of learning and memory in animals — adult mice would be our preferred animal. We want to show that they’re activated during learning, and then we want to knock out components of the pathways and show that it affects learning.
Can you talk about what’s going on in your lab right now?
There’s really like two labs: There’s a wet lab, which is working on the synaptic issues, and there’s an informatics lab. The informatics lab is, for the most part, not doing bioinformatics; we are doing some, but the primary informatics that we do is essentially text mining.
It’s not that different from data mining in other databases, but we’re trying to take separate pieces of information that are scattered in the scientific literature, and find how to put them together to help assess hypotheses. For example, you make a finding in the laboratory, which associates two things that were previously not known to be associated. You might want to know the potential mechanism that might link those two. The literature would not directly be talking about those two things before; if one thing is item A, there might be papers that talk about item A and item B. There might be other papers in the literature that talk about item B and item C. Given an A and a C, we’ll find the Bs that connect the two — that kind of thing. It’s a certain type of data mining strategy for text, especially the biomedical literature. Things that we’re doing with text mining occasionally lend themselves to looking at bioinformatics questions, as well.
[A paper we wrote that was just published in BMC Bioinformatics] was motivated by a text mining problem that we did. We are doing a project where we are disambiguating author names in Medline. …
We recognized, for the microRNA problem, that a similar kind of model could be used.
Can you talk about the microRNA problem and the findings of the paper?
We wanted a very general way of distinguishing microRNA interactions with messages, which made the fewest possible assumptions. We just looked at the distribution of the properties of how they interact — you take all microRNAs and you take all messages in the RefSeq database, and then you look at how they match in terms of different kinds of parameters that you might be interested in. [Then,] you make a distribution, and compare the distribution of the microRNAs with the distribution of scrambled microRNAs. We scrambled them several different ways, but it didn’t really matter how we did it.
The scrambled microRNAs would give a noise level, and the real microRNAs would give a distribution complementarity. For many parameters, but not all, there was a significant difference, and that gives outliers. So you can say that certain parameters are not seen using scrambled sequences but are seen with real sequences, so those are more likely to have a biological reality — or at least they’re statistical outliers. But when you took some parameters and put them together, you got better discrimination than any single parameter. That suggests that they are more likely to be real and working with each other. We used a combination of three or four parameters to define a set of outliers — I think there’re 73 outlier messenger RNAs that were unlikely to be produced by chance.
We used that outlier set to look for other properties to characterize the nature of the complementarity within that set, as well.
What’s the next step then? Where does this lead you?
Well, the first thing is that if you want to look for targets, you ought to have some idea of what makes a good target. In mammals, people have been restricting the search so far to the things that are the most sure. They’re looking at regions of genes that are conserved between species. That will improve the signal-to-noise ratio, right? Then, if they make a prediction, they’re less likely to be wrong. That doesn’t mean that they’re going to be finding all the targets, or even the best targets. It just means that these are the safest targets to look at.
[In the BMC Bioinformatics paper] we point out that, certainly at least in humans, and mammals, the rules might not be as restrictive [as those in C. elegans and Drosophila], and there might be many more targets. In fact, some of the best targets were not picked up by [other computational approaches to predict microRNA targets in flies and worms]. For example, HOXB8 we were very struck by because it has a G-U match where there’s typically a string that doesn’t have any G-U matches, so it was excluded by these other approaches. But it’s actually perfect, meaning there’re no mismatches across the entire length. It’s been experimentally confirmed that it’s a real target and it’s cleaved and so on.
That’s the take-home message from this paper; the rules may be different [for human microRNA targets], and there may be more good targets that don’t follow the expected rules, especially in the coding region. In our outlier set, the majority of the best hits were in the coding region and not in the 3’ UTR. That’s a major difference between the predictions that other groups have been making.
Have you gotten any feedback on the paper?
It’s been out for a week and a half or two weeks, so I think it’s kind of early at this moment. Nobody has called to denounce us, but I don’t know what the feedback will turn out to be.
In plants, its pretty clear that a lot of the [microRNA] targets are in the coding region, and some of the features we’re seeing in humans look like the features in plants. So I don’t know whether C. elegans and Drosphila are somehow different from many other species or what.
There was apparently very recently a paper that was looking for interactions in coding regions in Drosophila, and said that there were relatively few, that there was a great under representation of potential targets. But that was looking with the criteria … that was more restrictive than the criteria we were using. It suggested that maybe Drosophila really doesn’t use a lot of targets in the coding region, but plants do and our analysis suggests that humans might. It’s something that we’ll only know over time.