Researchers from Carnegie Mellon, Stanford, and Seoul National Universities have published a paper in the Proceedings of the National Academy of Sciences that demonstrates that combining online games with experimental testing can generate synthetic RNA structures that are more accurate than current in silico approaches.
Similar to methods that other online community games like Foldit — focused on protein folding — use, this approach to predicting RNA structures takes advantage of the wide pool of the largely inexpert players that make up the EteRNA community — a web-based game released in 2010 that challenges players to design RNA molecules that can fold into specific shapes.
The PNAS paper, which describes the results of analyzing some of the early activities of the EteRNA community, demonstrates "a successful attempt to generate and experimentally test hypotheses through crowdsourcing," one that could be adopted to solve many biomolecule design problems, the researchers wrote. In terms of RNA structure design problems in particular, it offers an alternative to expert analysis and interpretation of large high-throughput experimental datasets — "a challenging task even with modern machine learning and visualization tools."
"The quality of the designs produced by the online EteRNA community is just amazing and far beyond what any of us anticipated when we began this project three years ago," Adrien Treiulle, an assistant professor of computer science and robotics at Carnegie Mellon, one of the game's developers, and a co-author on the paper, said in a statement. The community's progress, he added, would not have been possible "if EteRNA members were just spitting out designs using online simulation tools" alone. "By actually synthesizing the most promising designs … we're giving our community feedback about what works and doesn't work in the physical world. And as a result, these non-experts are providing us insight into RNA design that is significantly advancing the science."
The year-long study that the PNAS paper describes provides an account of some of the early activities of the EteRNA community. According to the paper, each week the researchers asked players to design sequences that folded into specific RNA structures and to vote on which structures they thought were the best ones. At the end of the week, the eight sequences with the most votes were collected, synthesized, and verified by measuring the chemical reactivity of each base in the structure — scored on a scale of 0 to 100. Next, the researchers published their findings online and gamers used the feedback to revise their design strategies to solve future puzzles that became increasingly complex.
Since most of the participants had no experience, the earliest structures they developed were not very good, Jeehyung Lee, a doctoral student in computer science at Carnegie Mellon, one of EteRNA's developers and a co-author on the PNAS paper, told BioInform. But after a few rounds and "a lot of trial and error," players' performance improved and their predictions got better and better. Eventually, according to the paper, they designed successful RNA molecules in "two to three rounds for all targets." Lab tests showed that these predicted structures were better than ones generated by design algorithms such as RNAInverse and NUPACK.
Furthermore, players documented the strategies they devised to determine the optimal structures of these molecules, coming up with about 40 rules, "most of which encoded unique insights into successful RNA design," according to the paper. The researchers selected about five or six that were generally applicable and incorporated them into a new Monte Carlo algorithm called EteRNABot. Comparison tests between EteRNABot and existing algorithms shows that EteRNABot is also better at designing RNA molecules such as dendrimer-like structures and scaffolds for small molecule sensors, according to the paper.
One of those new rules is a process called "capping" that builds on a known feature of RNA structures. Lee explained it this way. Stable RNA structures have a lot of guanine-cytosine pairs. However, too many pairs results in a misfolded structure. The players' solution was to put GC pairs only at the end of the helices and to use different base pairs in the middle. Another rule that Lee described specifies the ideal number of adenine-uracil (AU) pairs that RNA structures should have based on the length of the helix in question.
Since the events described in the PNAS paper, the EteRNA community has grown significantly from 37,000 members at the close of the study to about 130,000 at present. RNA synthesis technology has also matured since those early days, Lee said. When the EteRNA community first got started, the project organizers were only able to synthesize eight structures per week. Now, however, it's possible to synthesize up to 1,000 sequences per month. This means that the community can take on a lot more projects, and its developers are looking for opportunities to put the platform and the players to work answering actual research questions. Some recent real world projects include one from Stanford University that asked players to design RNAs with 5-base hairpin loops building on a previous challenge focused on 4-base hairpin loops. Another one from Bowling Green State University involved making changes to an RNA hairpin loop in order to improve the molecule's reactivity.
Turning research questions into game puzzles isn't easy, Lee said, so to make the process less onerous, the authors of the PNAS paper are working on a template — which they hope to publish in March this year — that prospective researchers can use to transform their projects into puzzles that the EteRNA community can try to solve. It will, among other things, include information on how to restate research questions in simple terms that players can understand, he said. It will also help researchers formulate a specific research question they want players to answer instead of trying to get the community to work on an entire project.
The researchers are also considering adding the ability to use EteRNA to design three-dimensional RNA structures — right now, it can only handle two-dimensional secondary structures. That, however, is a much longer-term goal, Lee said.