Name: Andy Peek
Position: Director of Bioinformatics, Integrated DNA Technologies
Background: Manager of informatics, IDT — 2002-2003
Research scientist, IDT — 2002
Senior scientist, eXegenics (formerly Cytoclonal Pharmaceuticals) — 2001-2002
Postdoc, University of California, Irvine — 1998-2000
PhD, ecology and evolution/population genetics, Rutgers University — 1998
MSc, genetics and cell biology, Washington State University — 1994
BSc, biology/genetics, Washington State University — 1991
In September, IDT’s Director of Bioinformatics Andy Peek received a six-month grant worth about $100,000 from the National Institute of General Medical Sciences to develop modeling software that could improve researchers’ ability to predict RNAi molecule efficacy.
This week, RNAi News spoke with Peek about the effort and IDT’s plans for the software, which is freely available from IDT at ftp://scitoolsftp/idtdna.com/SEQ2SVM/.
Let’s start with you, your role at IDT, and how RNAi comes into play.
I’ve been at IDT for about five years now, and when I was first looking at coming into IDT people were talking about RNA interference. I’d been interested in the world of regulatory gene expression at a job previous to IDT where we were using antisense technologies, and even during my postdoc at UC, Irvine.
At IDT, my title is director of bioinformatics. What that means is I spend a good chunk of my time doing grant-related computational biology, for lack of a better word, and I run a group that does everything from putting together tools on IDT’s website to performing computational work for customers. We also have research projects that are in-house with other research groups in biochemistry, molecular biology … [and] chemistry research, [as well as] external collaborations.
This grant regarding the support vector machine modeling software, what’s the story behind that?
The story is that I think we’re really quite early in the world of small non-coding RNAs. Some of the computational approaches that have been taken before have been trying to find out in a very specific way what makes RNAi [molecules] good. The approach that I’ve taken is that I don’t think there is going to be any one single “good.” If you look at the protein complement within the human, we’ve got four Argonaute proteins, only one of which has the catalytic function of a Slicer in the active RISC. So what are those other three doing, if anything? Then, what are all the other combinatorial components that go into active RISCs?
At this point, we know what the minimally competent RISC is, but we’ve got a lot of other players that are underappreciated. So what makes an effective RNAi [molecule]? Right now, we’ve got a relatively decent understanding of what makes an effective duplexed 21-mer — a 19-base duplex with 2-base 3’ overhangs on either side. But there are a lot of other ways of, first, getting a RISC loaded; and then, looking at individual events within the series of catalytic events that have to occur to get an active … suppression of message. [These events, however,] are relatively uncharacterized, and finding out what those properties are is one of the very important things that the software is designed to do.
[The software] is designed to not just look at what makes an active RISC or an active RNAi [molecule] that goes into RISC, but to look at the subtle differences of the different properties of the different components that are probably going to become important as we discover more about the diversity of these small non-coding RNAs.
So what are the differences between the mechanisms of how a microRNA works versus an RNAi [molecule] as it’s loaded from a 21-mer versus an RNAi [molecule] that uses the dicing mechanism to load the RISC? Those three different kinds of RNAi [molecules] have very different properties from each other.
Keeping in mind that bioinformatics isn’t a specialty of mine, can you give an example of how this is done?
If you take one of the very common rules of your standard 21-mer RNAi [molecule] — the thermodynamic asymmetry where the 5’ end of the strand that has the lowest amount of thermodynamic stability is incorporated preferentially to become the guide strand in RISC — that property is very obvious in a 21-mer. If you use a Dicer-loaded RNA interference mechanism, that thermodynamic asymmetry goes away. In other words, if you use proteins to help load your RISC, your thermodynamic asymmetry parameter is no longer important. So there are at least two different pathways to get something into RISC, in one of which that [thermodynamic] feature is important [while] in the second one … it is not.
So that’s one example of the differences that can be detected based upon what biochemical mechanisms are being involved to get something to be incorporated and then active.
At this point, how developed is this approach? What sort of testing have you done to validate it?
That’s a very good question. We have a couple of in-house data sets and some publicly available data sets, and we’ve been looking at about a dozen different parameters that can theoretically have some influence on activity at some level, whatever that endpoint readout is.
Generally speaking, the endpoint is total amount of target messenger RNA suppression. But you can think of different endpoints that you might want to look at, for example off-target effects or immunostimulation or other endpoints that would be different from target mRNA knockdown.
The validation is ongoing. Some of the modules that we are looking at are fairly validated in the sense that it’s repeating other people’s work. For example, the thermodynamics, those observations have been reported in the literature for several years now. Looking at internal data sets and publicly available data sets, we see the same things and feel fairly confident that the software is valid in that sense.
Looking at other things that are either underreported or not reported previously in the literature, we take a little bit of extra time to make sure that the software is doing what we think it’s doing [using] in-house methods that are standard in the software world.
Are you doing this work in-house?
The grant supports in-house software development. Some of the data sets involve both in-house and external collaborations. Obviously, some of those won’t be able to be published depending on who those collaborators are.
So this software is going out into other researchers’ hands to see how it works for them?
Right now, IDT has an open-source model on this particular software package, and we’re providing an FTP site to allow people to download it.
Is the ultimate goal to market this or will this always be a freely available thing?
IDT at this moment is not a software company, and we have the luxury of writing bioinformatics software for people to use to perform some functions that are useful to them. At the end of the day, people buy IDT widgets, the synthetic nucleic acids, and that supports bioinformatics at IDT.
So the software is a tool customers can use in their research?