At A Glance
Name: Terran Lane
Position: Assistant professor, department of computer science, University of New Mexico
Education: PhD, electrical and computer engineering, Purdue University — 2000
BS, electrical and computer engineering, Purdue University — 1994
After studying machine learning while at Purdue, Terran Lane went on to conduct his postdoctoral work in the Artificial Intelligence Laboratory at the Massachusetts Institute of Technology. In 2002, he moved on to his current position at the University of New Mexico to continue researching machine learning, but also reinforcement learning, behavior, and control, and artificial intelligence.
Recently, he spoke with RNAi News about how his work has expanded into the field of RNAi.
Can you give an overview of the kind of research you do in you lab?
My group works on machine learning, in general, and machine learning applications to bioinformatics, specifically. That's what got me interested in RNAi a couple of years ago when it became prominent enough to start being noticed outside of [the field of] biology.
I ended up getting in touch with a group in biology here studying immunology. They were interested in RNAi peripherally — its role in immune response. We all went in together to propose a large center grant and that was accepted. So we now have the Center for Evolutionary and Theoretical Immunology here at UNM in cooperation with Los Alamos Labs and the Santa Fe Institute. My colleague here, Stephanie Forrest, was instrumental in putting me in touch with our colleagues in biology. She also did a lot of work on the center grant and establishing the center, and has provided me a lot of advice and support through the whole project.
As part of [the center], I've been working on RNAi and my group has been studying bioinformatics questions related to RNAi. We've looked at off-target effect prediction, which [is covered] in a paper that just appeared in Nucleic Acids Research. We've also looked at siRNA efficacy prediction, we've looked at gene family knockdown, and on the side we've done a few high-performance computing things and some high-efficiency string search routines to support all this other stuff.
What we're ultimately interested in is building strong statistical models of various parts of the RNAi mechanisms, and ultimately an end-to-end predictive model of the RNAi mechanism. This is very ambitious and will take quite a while, but that's the goal to shoot for.
So was your getting into RNAi just an outgrowth of your own personal curiosity, or did someone come to you and ask [for help]?
It really started when I got in touch with these immunologists. We were kind of talking about collaboration possibilities, and some of them were interested in RNAi in the abstract because early indications were that it was an immunological mechanism — sort of a genetic immune system. They were also interested in using it as a tool to do controlled gene knockdowns for the organisms they were studying. They brought it up to me and showed me some of the early articles on it, and I thought it was really fascinating because it's fairly rare that you get to see a chunk of science of this magnitude in the making.
It seemed to me at the time that there was very little known about it and there were very few strong theoretical models of it. At that stage, we were just beginning to see significant amounts of empirical data, and people had a lot of first-order estimates like, 'If you want to make a good siRNA, the GC content should be between 40 and 60 percent.' That's a perfectly serviceable rule, and it was based on a good first-order analysis of the data, but it seemed to me there was a lot more sophisticated analysis waiting to be done, so it seemed like a good field to go into.
This off-target effect prediction work, could you give an overview of the Nucleic Acids Research paper?
The very early work in RNAi was filled with a lot of excitement about the specificity of the RNAi mechanism — that you could hit a single gene and have very little impact on the rest of the genome. But that seemed rather suspicious to my group because we noted that there is a certain amount of redundancy in the genome and therefore some of these siRNAs should, in principle, have multiple interaction points. So we decided to go in and look computationally, which is our specialty.
We started with genomes from human, C. elegans, and S. pombe. [We took] a very, very simple model of the RNAi binding process, the interaction between siRNA and mRNA, and looked at how many off-target interactions you'd expect across the entire genome. What we found is that the average number of off-target interactions was surprisingly high. If you did the following experiment: pick a gene at random, pick an siRNA at random, you introduce that siRNA into the organisms, then your chances of that siRNA interacting with an off-target gene — a gene other than the one you drew it from — can be as high as 15 percent, which is disconcertingly high.
The binding model that we chose is actually a very conservative one. We said, 'Yes, it has to match exactly, nucleotide to nucleotide.' Then the number is as high as 15 percent. If you relax that to something more biologically plausible by allowing mismatches and gaps and things like that, then the number only gets bigger. So the Nucleic Acids Research paper is essentially an in-depth analysis of this question. We looked at a handful of different models of the binding processes. We said, 'If it works this way, what does our off-target effect look like? If it works that way, what's the average off-target effect?' There are a couple of plots in the paper that show what the trade-off is under different conditions.
That's the core result, [but there was another] interesting result that came out of it. Biologically, siRNAs appear to be between 19 and 23 nucleotides long, depending on the organism and what exact version of Dicer they have. But computationally we can say, 'What happens if we make siRNAs 12 nucleotides long or 35 nucleotides long? If Dicer happened to cleave them like that, would it make a substantial difference to the off-target rate?' What we found is that the 20/21 nucleotide region seems to be a pretty nice trade-off point in that you've gotten most of the improvement in off-target rate. When you're down at 15 or 13 nucleotides for your siRNA, the off-target rate goes through the ceiling. When you get down to 20 nucleotides, the off-target rate has fallen most of the distance it's going to fall, and at that point adding extra nucleotides to your siRNA doesn't actually change the off-target rate significantly. I wouldn't call this an conclusion about the evolutionary pressure on Dicer, but it does seem to be evidence that Dicer is tuned to pick an siRNA length that is reasonable.
This study was essentially purely computational. What we're really trying to at the moment is get in contact with people who have real empirical data on this stuff to refine and improve our models of off-target binding. One result I just read about the other day [was obtained by] Anastasia Khvorova's group at Dharmacon. They've done a very careful broad spectrum of the effect of different siRNAs on the complete genome. It seems like, at a high-level reading, their numbers are reasonably consistent with the kind of average rate that we get. So I'd say there's at least one empirical data point that backs up the kind of average rates that we see, although our computational models aren't nearly as good as they should be. We really need to back it up with more data, but at least at a high level, the average rates seem to be there.
You mentioned other projects. Can you touch on where your RNAi work is headed now?
The one that's furthest along is one gene family knockdown. This came out of one of our colleagues here, Si-Ming Zhang, asking, 'What if I want to knock down a number of genes at the same time?' because he's interested in the effect of a family of genes in the model organism he works with, the B. glabrata snail. He said, 'What I really want to be able to do is suppress the entire family of genes simultaneously.' We said, 'Well, that's an interesting question.' If you want to do this, if you want to knock down a group of genes rather than an individual gene, one thing you can do is build a single siRNA for each gene, but that's not always a good idea — it's expensive, and there're indications that the most siRNAs you introduce, the more tendency you have to a) saturate the systems and b) induce a toxic response.
You really want to keep your set of siRNAs relatively small. So we were asking, 'What is the smallest set of siRNAs, the smallest pool, if you will, that you can use while still having a chance of knocking down everything in this family?' That work was about computationally designing minimal sets of siRNAs for this problem. What we found was that with reasonably sophisticated algorithms, we could tackle this problem and come out with fairly compact sets. If you wanted to knock down a set of 20 genes, we can often get down to 8 to 10 siRNAs. Now, of course it depends on how closely related the set of genes you chose are. If you chose a family of genes that are all fairly closely related, then you're going to have a pretty good chance of getting a compact pool of siRNAs. If you chose genes at random out of the genome, or genes that are deliberately diverse, it becomes much harder.
We have a paper on that that appeared in the journal Artificial Intelligence in Medicine. We want to do further work in that, but it's kind of waiting on better models of siRNA efficacy and off-target interaction.
You mentioned putting together statistical models for all the different mechanisms of RNAi. As you said, that's a huge goal. Are you working in collaboration with anybody else on this?
I'm hoping to. At the moment, it's pretty much us in our laboratory and a few people here on campus, but they're mostly using RNAi as a tool. We can suggest they use one siRNA over another, and they'll try it, but they're not really set up to do large-scale efficacy studies or large-scale studies of the mechanism or regulatory networks.
So, I'm absolutely trying to build collaborations, but our lab is a little bit late in getting into this whole area. We only effectively started two years ago, and it took us a year to come up to speed on what's known about RNAi in the first place. But I certainly welcome collaborations.
We've have some discussions with small groups and lent our tools to a couple people, but so far we haven't had any deep, long-term collaborations.
Is there anything else you're working on that I may have missed?
We've got some interesting results on siRNA efficacy prediction, but they're not published yet. I consider them to be fairly interesting, and we're preparing them for publication. The easy thing to say [about the data], which pretty much everybody is going to believe, is that the full story isn't in yet.
What I think we really need is large amounts of publicly available data. That's really going to drive better modeling. We can make coarse models a priori based on raw biological knowledge, but better models emerge out of looking at the real empirical data. So far, in our best efforts, we find there're something like 500 or 600 data points that are publicly available. You go to the conferences and a lot of the pharmaceutical companies show up and say, 'Oh, well, we did this to 2,000 siRNAs.' Then you go ask them for the data and they say they're proprietary. I think this is a real crunch for the community. It's not my place to stand on a podium here, but I think if the community wants better models of efficacy and if they want better models of toxicity and things like that, there're going to have to be more public data — preferably nicely archived in a central location, sort of like the GenBank.
I think at the moment, everyone's playing it close to the chest in the hopes that they can be the ones to provide all the tools that everyone else is going to use. I think that really, it's just going to work better if everybody has access to all the data. Not everybody can be good at everything.