NEW YORK (GenomeWeb) – Rice University said today it has reeled in a $1.1 million grant from the National Science Foundation to develop cloud computing tools for use in evolutionary genomics research.
Under the three-year grant, two Rice research teams will develop parallel-processing tools for studying how genes and genomes evolve across different species. The teams expect these open-source algorithms will enable other researchers with limited access to supercomputing resources but who can rent cloud computing time from firms such as Amazon or Microsoft to perform sophisticated computing techniques.
The statistical modeling programs they develop will be able to run parallel analyses on thousands of computers, and may make it possible to use large-scale analyses to trace genes at scales that were not previously practical, Rice said.
"We're doing basic analysis of evolutionary questions," said Luay Nakhleh, an associate professor of computer science at Rice and one of the project's leaders. "Evolutionary biologists sample taxa from across the tree of life. They want to know, for example, how a big group of plants may have evolved."
The researchers also plan to expand upon Bayesian inference techniques that enable biologists to incorporate prior knowledge into their analysis.
"Analyzing data sets with 10 or 20 gene sequences can easily take hundreds of hours," Nakhleh said. "But the tree of life has millions of sequences and is built from millions of species. There’s no way traditional Bayesian techniques are even going to get close to handling that."
Rice Associate Professor Christopher Jermaine, the project's other leader, said a problem that involves analyzing 50 organisms could require tens of thousands of hours of computing time. While that is "doable," trying to perform the same analyses with thousands or organisms would not be.
"We’re not talking about taking a one-day calculation and taking it down to minutes," Jermaine said. "We’re talking about potentially taking a years- or decades-long computation and making it feasible by changing the underlying algorithm and making it amenable to distributed computing."
Jermaine and Nakhleh want to create a turnkey software that will make it easy to use for biologists, Rice said.
"My impression is they want a very low bar to entry," Jermaine said. "If they have to write a lot of code or have to figure out how to use all these servers, they’re just not going to do it. Hopefully our solution will be as easy for biologists as pressing a return key."