Bioinformatics has been a relative latecomer to the RNA interference party. The same qualities that have made RNAi so popular for functional genomics research — its ease of use, its reproducibility, and its effectiveness — make clever algorithms and computational tricks pretty much irrelevant.
Armed with GenBank and a few simple guidelines provided by RNAi pioneer Tom Tuschl of Rockefeller University (see box, p. 5), anyone can design a 21-nucleotide short interfering RNA (siRNA) sequence with a 70 percent chance of silencing a gene of interest by destroying mRNA during translation.
But a nascent RNAi informatics community is beginning to take shape as demand for the technology grows. Although researchers are seeing success with siRNA selection methods that lean more toward trial-and-error than rule-based design, there is always room for improvement — about 30 percent more room, if Tuschl’s estimate is correct. A number of players in both the public and the private sector are striving to improve the computational design of siRNAs by tweaking Tuschl’s rules to reflect new data.
“Experimentation is producing new ideas of ways of selecting siRNAs, so we’re trying to incorporate these findings into our program,” said Fran Lewitter, associate director for biocomputing at the Whitehead Institute. Lewitter’s group began developing a web-based software program based on Tuschl’s rules over a year ago, and released it publicly in March (http://jura.wi.mit.edu/bioc/siRNA/ home.php). Lewitter’s team is working directly with Tuschl now in order to gain immediate access to new criteria based on experimental data that it can incorporate into its algorithm, she said. In addition, her lab is about to embark on a new project to generate experimental data of its own to discover a few “more sophisticated rules.”
Lewitter’s group is not the only one expanding its dataset in order to guide siRNA design. The field of RNAi is very young, and Tuschl’s guidelines stand as only a first attempt to impose some order on a complex process. “These ‘pseudo-rules’ have been developing over the last year,” said Eric Lader, global business manager of gene silencing at oligo design firm Qiagen. More recently, he said, “People have looked at a much larger population of siRNAs that work and a much larger population of siRNAs that haven’t worked, and tried to find patterns in them.”
Ye Ding, a bioinformaticist at the Wadsworth Center in New York and developer of the Sfold suite of RNA design algorithms (http://sfold. wadsworth.org/index.pl), agreed that fresh data is ushering in a new phase of siRNA informatics. While acknowledging Tuschl’s criteria as “the rules of the game,” Ding noted, “there have been exceptions to most of those rules that have been reported in the literature. There are really no golden rules here; they will keep evolving with the accumulation of more experimental data.”
Why Settle for 70 Percent?
Tuschl himself has been known to scoff at the notion of sharpening the design algorithms for siRNAs. After all, with a 70 percent success rate for random selection, why bother spending time collecting additional criteria? But those aiming to improve upon the hit-or-miss approach argue that they will lower the expense of RNAi experiments.
Currently, RNA oligo vendors such as Ambion, Dharmacon, and Qiagen provide free siRNA design programs on their websites, but only Dharmacon guarantees that siRNAs designed with its SmartSelection tool will suppress activity in a chosen gene. The company provides what it calls a SmartPool of four siRNAs. One out of the four is guaranteed to have a 99.99 percent chance of knocking down the gene of interest by at least 75 percent. But, as Qiagen’s Lader explained, Dharmacon, Qiagen, and others are still “playing the numbers game” with pooled siRNA sets. “If someone developed a really good design algorithm, you wouldn’t need four in a pool, you’d need one. So no one is at that point yet.”
Improving the accuracy of siRNA selection algorithms will ultimately benefit the end users of the reagents. “For academia, they’d like to have one or two that work instead of four,” said Ding. And for drug discovery companies planning on screening the whole human genome, 60,000 or so siRNAs at $200 to $300 a pop can add up fairly quickly. “It pays for them to design the things as best as they can,” said Lader.
Variations on a Theme by Tuschl
Any attempts to improve the current state of computational siRNA design build upon Tuschl’s guidelines. Ding, for example, has developed an siRNA design algorithm that combines the Tuschl rules with a method to predict the secondary structure and accessibility of the mRNA target. Ding said that although secondary structure is a well-known factor in designing antisense oligos, it is not widely accepted in siRNA design. The role of secondary structure in RNAi is questionable: There is experimental evidence that both supports and disproves the claim that the structure of the mRNA affects siRNA binding. However, Ding said that so far, his team has tested six siRNAs against two targets, “and all of them are potent inhibitors, with over 70 percent silencing on the protein level.” Ding recently received a $2 million NIH grant to improve the software using experimental information. He said his group has also recently entered a collaboration with an “industrial partner” to experimentally validate its approach.
Another method that builds upon the Tuschl rules is under development at Galapagos Genomics, a Belgian functional genomics firm. Bioinformaticist Mark Lambrecht explained that the company “took the [Tuschl] rules and adapted them for our system, so we had to throw away some things…and we added some rules.” Galapagos is building an adenoviral knockdown library of more than 5,000 druggable human genes, Lambrecht said, and uses a hairpin RNA structure constructed with a U6 promoter — an approach that called for a modification of the basic siRNA design rules. The U6 promoter, for example, requires that “the pattern always ends in C,” Lambrecht said. In addition, the company selected molecules that knock down different transcript variants at the same time “to limit the size of the library.” The gene library, called SilenceSelect, will be commercially available in July, but, like others in the field, “we’re still in the process of generating data and adding that back into the algorithm to make it better,” Lambrecht said, adding that the new algorithm would be used for custom knockdown contracts.
Oligo design firms are also sprucing up their siRNA design algorithms to keep pace with the new findings in RNAi research. Qiagen, for example, is collaborating with the genomics research division of a large pharmaceutical firm to improve its approach. “The features that our collaborators have found seem to be about the relative stability of different parts of the 21-base duplex,” Lader said. “Some of them are more easily recognized and loaded into the RISC [RNA-induced silencing complex] complex.” The goal, Lader said, is to gather enough experimental information to create an effective “ranking program” for selected sequences. Dharmacon continues to improve its software, as well. According to Bill Marshall, executive vice president of research and production, the company built its current algorithm around an experimental data set of 360 different siRNAs that it screened against four different genes. Based on a statistical analysis of the data, combined with a weighting scheme to account for different factors, the company eventually arrived at a set of 34 criteria involved in the selection of an effective siRNA that it built into its algorithm. The company continues to gather data — both from its in-house research and its customers — to enhance the algorithm, Marshall said.
Pharmaceutical companies are also working to advance computational siRNA design. Last week, the bioinformatics group at Bristol-Myers Squibb contributed three software modules for siRNA design to BioPerl (www.bioperl.org). Don Jackson, a research investigator at BMS who developed the software, said the approach is “based on the rules that the Tuschl group published, which is one of the reasons we wanted to put the code back out in the public domain.” Jackson tweaked the original rules by adding a “complexity filter” that avoids strings of repeating nucleotides as well as a filter to eliminate oligos where there are likely to be SNPs. Jackson said that he’s keeping an eye on the scientific literature for new criteria that he can incorporate in the algorithm.“As additional information becomes available on how to improve siRNA selection, I will continue to add it to the software,” he said.
Jackson said that current research efforts to characterize the biochemical and biophysical mechanisms of the RISC complex — as well as the future availability of large siRNA experiments — should improve the success rate of current methods. However, he admitted, trial-and-error still reigns even when additional criteria are added. His use of SNP elimination criteria may be a “moot point,” for example, because so little is understood about the biology behind effective siRNA binding. His choice to add the SNP criteria to his software amounted to a “just in case” scenario: “It’s sort of like an airbag,” he said. “If you’re not in a crash, it doesn’t matter whether you’ve got one or not.”
Despite concerted efforts to feed more data back into siRNA design algorithms, it’s uncertain whether a discrete set of rules to guide the selection of effective sequences is even obtainable. “We’re involved in quite a large project to advance the design algorithms,” said Lader, “and whether or not that is going to be any better than the way we’re picking now — where we pretty much can guarantee functionality if we use four — we just don’t know.”
“Not many companies have enough data to improve on the [Tuschl] rules,” Lambrecht pointed out. In addition, he said, the lack of standardization in the young field has stymied efforts to compare experimental results in a statistical manner. However, he noted, “I’m very confident that as more data comes out, an improvement on the Tuschl rules will be possible.”
As Lewitter explained, “This is an art, it’s not a science yet. All the rules are not perfect, but we can eliminate a lot of the noise and give somebody a more educated guess about what’s going to be good.”
This article originally appeared in the June 2, 2003 issue of BioInform, the Integrated Informatics News Source.
Copyright © 2003 GenomeWeb. All Rights Reserved.